[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-10 Thread Gururaj Shetty (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356394#comment-14356394
 ] 

Gururaj Shetty commented on YARN-3187:
--

Thanks [~jianhe] and [~Naganarasimha Garla] for committing and reviewing the 
patch.

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Fix For: 2.7.0
>
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, 
> YARN-3187.4.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356384#comment-14356384
 ] 

sandflee commented on YARN-3328:


Is there any necessary to keep containers info in NMClientAsync?  YARN-3327 
also caused by this.

> There's no way to rebuild containers Managed by NMClientAsync If AM restart
> ---
>
> Key: YARN-3328
> URL: https://issues.apache.org/jira/browse/YARN-3328
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, applications, client
>Affects Versions: 2.6.0
>Reporter: sandflee
>
> If work preserving is enabled and AM restart, AM could't stop containers  
> launched by pre-am, because there's no corresponding container in 
> NMClientAsync.containers.
>  There‘s no way to rebuild NMClientAsync.containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion

2015-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356373#comment-14356373
 ] 

Hudson commented on YARN-3295:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7303 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7303/])
YARN-3295. Fix documentation nits found in markdown conversion. Contributed by 
Masatake Iwasaki. (ozawa: rev 30c428a858c179645d6dc82b7027f6b7e871b439)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnCommands.md


> Fix documentation nits found in markdown conversion
> ---
>
> Key: YARN-3295
> URL: https://issues.apache.org/jira/browse/YARN-3295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Fix For: 2.7.0
>
> Attachments: YARN-3295.001.patch
>
>
> * In ResourceManagerRestart page - Inside the Notes, the "_e{epoch}_" , was 
> highlighted before but not now.
> * yarn container command
> {noformat}
> list ApplicationId (should be Application Attempt ID ?)
> Lists containers for the application attempt.
> {noformat}
> * yarn application attempt command
> {noformat}
> list ApplicationId
> Lists applications attempts from the RM (should be Lists applications 
> attempts for the given application)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K resolved YARN-3329.
-
  Resolution: Duplicate
Release Note:   (was: the same to YARN-3328, sorry for creating twice)

> There's no way to rebuild containers Managed by NMClientAsync If AM restart
> ---
>
> Key: YARN-3329
> URL: https://issues.apache.org/jira/browse/YARN-3329
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, applications, client
>Affects Versions: 2.6.0
>Reporter: sandflee
>
> If work preserving is enabled and AM restart, AM could't stop containers or 
> query container status launched by pre-am, because there's no corresponding 
> container in NMClientAsync.containers.
> And there‘s no way to rebuild NMClientAsync.containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (YARN-3329) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K reopened YARN-3329:
-

> There's no way to rebuild containers Managed by NMClientAsync If AM restart
> ---
>
> Key: YARN-3329
> URL: https://issues.apache.org/jira/browse/YARN-3329
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, applications, client
>Affects Versions: 2.6.0
>Reporter: sandflee
>
> If work preserving is enabled and AM restart, AM could't stop containers or 
> query container status launched by pre-am, because there's no corresponding 
> container in NMClientAsync.containers.
> And there‘s no way to rebuild NMClientAsync.containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion

2015-03-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356358#comment-14356358
 ] 

Tsuyoshi Ozawa commented on YARN-3295:
--

+1

> Fix documentation nits found in markdown conversion
> ---
>
> Key: YARN-3295
> URL: https://issues.apache.org/jira/browse/YARN-3295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-3295.001.patch
>
>
> * In ResourceManagerRestart page - Inside the Notes, the "_e{epoch}_" , was 
> highlighted before but not now.
> * yarn container command
> {noformat}
> list ApplicationId (should be Application Attempt ID ?)
> Lists containers for the application attempt.
> {noformat}
> * yarn application attempt command
> {noformat}
> list ApplicationId
> Lists applications attempts from the RM (should be Lists applications 
> attempts for the given application)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356343#comment-14356343
 ] 

Hadoop QA commented on YARN-1884:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703734/YARN-1884.3.patch
  against trunk revision 5c1036d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.api.impl.TestYarnClient

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6913//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6913//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6913//console

This message is automatically generated.

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-10 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356331#comment-14356331
 ] 

Devaraj K commented on YARN-3225:
-

bq. not to pass timeout value to YARN RM side but make it proper handled in 
RMAdmin side which make things much simpler
Here what would happen to the decommissioning node if the RMAdmin issued 
refreshNodeGracefully() and gets terminated(exited) before issuing the 
'refreshNode forcefully'? This can be done by doing Ctrl+C on the command 
prompt. The Node will be in decommissioning state forever and becomes unusable 
for new containers allocation.

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
> Attachments: YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356325#comment-14356325
 ] 

Zhijie Shen commented on YARN-1884:
---

[~xgong], sorry for raising the issue late, but I forgot a compatibility issue. 
It's possible that the timeline server is upgraded to the new version, but the 
stored data is old, such that nodeHttpAddress info is not available. In this 
case, CLI will show "null", which is not user-friendly info. Can we do 
something similar to
{code}
  if (usageReport != null) {
//completed app report in the timeline server doesn't have usage report
appReportStr.print(usageReport.getMemorySeconds() + " MB-seconds, ");
appReportStr.println(usageReport.getVcoreSeconds() + " vcore-seconds");
  } else {
appReportStr.println("N/A");
  }
{code}

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3080) The DockerContainerExecutor could not write the right pid to container pidFile

2015-03-10 Thread Abin Shahab (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356323#comment-14356323
 ] 

Abin Shahab commented on YARN-3080:
---

[~vinodkv], Do you think you can review this? I have several other patches 
which are dependent on this.
Thanks!

> The DockerContainerExecutor could not write the right pid to container pidFile
> --
>
> Key: YARN-3080
> URL: https://issues.apache.org/jira/browse/YARN-3080
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Beckham007
>Assignee: Abin Shahab
> Attachments: YARN-3080.patch, YARN-3080.patch, YARN-3080.patch, 
> YARN-3080.patch
>
>
> The docker_container_executor_session.sh is like this:
> {quote}
> #!/usr/bin/env bash
> echo `/usr/bin/docker inspect --format {{.State.Pid}} 
> container_1421723685222_0008_01_02` > 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
> /bin/mv -f 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid.tmp
>  
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/nmPrivate/application_1421723685222_0008/container_1421723685222_0008_01_02/container_1421723685222_0008_01_02.pid
> /usr/bin/docker run --rm  --name container_1421723685222_0008_01_02 -e 
> GAIA_HOST_IP=c162 -e GAIA_API_SERVER=10.6.207.226:8080 -e 
> GAIA_CLUSTER_ID=shpc-nm_restart -e GAIA_QUEUE=root.tdwadmin -e 
> GAIA_APP_NAME=test_nm_docker -e GAIA_INSTANCE_ID=1 -e 
> GAIA_CONTAINER_ID=container_1421723685222_0008_01_02 --memory=32M 
> --cpu-shares=1024 -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/container-logs/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -v 
> /data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02:/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02
>  -P -e A=B --privileged=true docker.oa.com:8080/library/centos7 bash 
> "/data/nm_restart/hadoop-2.4.1/data/yarn/local/usercache/tdwadmin/appcache/application_1421723685222_0008/container_1421723685222_0008_01_02/launch_container.sh"
> {quote}
> The DockerContainerExecutor use docker inspect before docker run, so the 
> docker inspect couldn't get the right pid for the docker, signalContainer() 
> and nm restart would fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356309#comment-14356309
 ] 

Zhijie Shen commented on YARN-3332:
---

It sounds a great proposal, thanks Vinod! I quick thought about the publishing 
channel of the collected statistics. I'm not sure how different the access 
pattern would be, but just thinking it out loudly, is it possible reuse the 
timeline service to distribute the node statistics, getting rid of maintaining 
different but similar interfaces (or multiple data flow channels). On step 
further, we can make the timeline service the main bus to transmit metrics from 
A to B.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356305#comment-14356305
 ] 

Hadoop QA commented on YARN-2784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703800/0002-YARN-2784.patch
  against trunk revision a5cf985.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs-plugins
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6910//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6910//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6910//console

This message is automatically generated.

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: 0002-YARN-2784.patch, YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356303#comment-14356303
 ] 

Hadoop QA commented on YARN-1884:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703734/YARN-1884.3.patch
  against trunk revision 5c1036d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6911//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6911//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6911//console

This message is automatically generated.

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols

2015-03-10 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3330:

Attachment: pdiff_patch.py

Attach a very simple python tool to check if there are any incompatible changes 
in protobuf files. This DFA-based tool checks if an incoming patch has any 
non-trivial removals in our existing protobuf files, and report error if there 
is a removal. It also checks if there are any new content in protobuf files, 
and raise a warning for further (human) investigation. This is the very first 
step toward automatic rolling upgrade compatibility verification. On the 
protobuf side, we may want to:

# understand file insertions/removals, on top of line-by-line verification
# understand the roles of different protobufs and treat them separately

Any other suggestions are more than welcome. 

> Implement a protobuf compatibility checker to check if a patch breaks the 
> compatibility with existing client and internal protocols
> ---
>
> Key: YARN-3330
> URL: https://issues.apache.org/jira/browse/YARN-3330
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Li Lu
>Assignee: Li Lu
> Attachments: pdiff_patch.py
>
>
> Per YARN-3292, we may want to start YARN rolling upgrade test compatibility 
> verification tool by a simple script to check protobuf compatibility. The 
> script may work on incoming patch files, check if there are any changes to 
> protobuf files, and report any potentially incompatible changes (line 
> removals, etc,.). We may want the tool to be conservative: it may report 
> false positives, but we should minimize its chance to have false negatives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356274#comment-14356274
 ] 

Sangjin Lee commented on YARN-:
---

I'll put it on hold until YARN-3039 is done.

> rename TimelineAggregator etc. to TimelineCollector
> ---
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-.001.patch
>
>
> Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
> TimelineCollector, etc.
> There are also several minor issues on the current branch, which can be fixed 
> as part of this:
> - fixing some imports
> - missing license in TestTimelineServerClientIntegration.java
> - whitespaces
> - missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356271#comment-14356271
 ] 

Sangjin Lee commented on YARN-2928:
---

No worries. I'll wait until YARN-3039 is done. Thanks for letting me know.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3210) [Source organization] Refactor timeline aggregator according to new code organization

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356272#comment-14356272
 ] 

Zhijie Shen commented on YARN-3210:
---

bq. . Yes, it's currently an auxiliary service to the NM, but it can easily be 
started up as a standalone daemon.

I think the idea behind it is that all the node-level aggregator logic should 
be inside TimelineAggregatorsCollection. In aux service mode, 
TimelineAggregatorsCollection is started in 
PerNodeTimelineAggregatorAuxService, while in stand-alone mode, 
TimelineAggregatorsCollection is started as a separate process.

The problem seems to be that the way to start a logic app-level aggregator is 
not decoupled with the aux service. To make it common no matter the aggregator 
is in aux service, standalone process or container, starting the app-level 
aggregator could be treated as a IPC call in Aggregator<->NM communication 
protocol.

> [Source organization] Refactor timeline aggregator according to new code 
> organization
> -
>
> Key: YARN-3210
> URL: https://issues.apache.org/jira/browse/YARN-3210
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Li Lu
>Assignee: Li Lu
>  Labels: refactor
> Fix For: YARN-2928
>
> Attachments: YARN-3210-022715.patch, YARN-3210-030215.patch, 
> YARN-3210-030215_1.patch, YARN-3210-030215_2.patch
>
>
> We may want to refactor the code of timeline aggregator according to the 
> discussion of YARN-3166, the code organization for timeline service v2. We 
> need to refactor the code after we reach an agreement on the aggregator part 
> of YARN-3166. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356266#comment-14356266
 ] 

Hadoop QA commented on YARN-3295:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703825/YARN-3295.001.patch
  against trunk revision 5c1036d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6912//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6912//console

This message is automatically generated.

> Fix documentation nits found in markdown conversion
> ---
>
> Key: YARN-3295
> URL: https://issues.apache.org/jira/browse/YARN-3295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-3295.001.patch
>
>
> * In ResourceManagerRestart page - Inside the Notes, the "_e{epoch}_" , was 
> highlighted before but not now.
> * yarn container command
> {noformat}
> list ApplicationId (should be Application Attempt ID ?)
> Lists containers for the application attempt.
> {noformat}
> * yarn application attempt command
> {noformat}
> list ApplicationId
> Lists applications attempts from the RM (should be Lists applications 
> attempts for the given application)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356263#comment-14356263
 ] 

Li Lu commented on YARN-2928:
-

bq. can we defer the renaming work until that patch get in?
I'm +1 on this suggestion. When we commit YARN-3210 back, there were some work 
interferences that delayed YARN-3264. This time we may probably want to have 
less interference with ongoing aggregator (to-be-changed) related work. 

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356260#comment-14356260
 ] 

Vinod Kumar Vavilapalli commented on YARN-3332:
---

Chose the service model because the machine level big picture is fragmented 
between YARN and HDFS (and HBase etc) - having a lower level common statistics 
layer is useful.

I anyways needed a service to expose an API for both admins/users as well as 
external systems beyond HDFS too - I can imagine tools being built on top of 
this.

That said, it doesn't need to be service or library. I can think of a library 
that wires into the exposed API, though I haven't found uses for that yet.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356261#comment-14356261
 ] 

Vinod Kumar Vavilapalli commented on YARN-3332:
---

Agreed, this should be entirely possible.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356257#comment-14356257
 ] 

Zhijie Shen commented on YARN-3240:
---

bq. We should strive to preserve the state where v.2 can be disabled without 
affecting v.1 whatsoever, and vice versa

Yeah, the newly added v2 apis doesn't affect the existing v1 apis. As mentioned 
before, we do it in the same class as most HTTP related code can be reused, 
which is actually the major body while the specific logic of putting the entity 
is a simple call.

> [Data Mode] Implement client API to put generic entities
> 
>
> Key: YARN-3240
> URL: https://issues.apache.org/jira/browse/YARN-3240
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Fix For: YARN-2928
>
> Attachments: YARN-3240.1.patch, YARN-3240.2.patch, YARN-3240.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI

2015-03-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356255#comment-14356255
 ] 

Junping Du commented on YARN-3225:
--

Thanks [~devaraj.k]. Sorry for missing an important comment on current patch: 
>From many discussions in umbrella JIRA (YARN-914), most of us prefer not to 
>pass timeout value to YARN RM side but make it proper handled in RMAdmin side 
>which make things much simpler (please check discussion there for more 
>details). So the CLI with -g option could be a blocked CLI until all nodes get 
>decommissioned or timeout.
Psudo logic in this CLI should be something like: 
{code}
refreshNodeGracefully(); - mark node to decommissionning
while (!timeout && some nodes still in decommissioning) {
  checkStatusForDecommissioningNodes();
}
if (timeout) {
  refreshNode forcefully for reminding nodes
}
{code}
Thoughts?

> New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
> ---
>
> Key: YARN-3225
> URL: https://issues.apache.org/jira/browse/YARN-3225
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Devaraj K
> Attachments: YARN-3225.patch, YARN-914.patch
>
>
> New CLI (or existing CLI with parameters) should put each node on 
> decommission list to decommissioning status and track timeout to terminate 
> the nodes that haven't get finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3295) Fix documentation nits found in markdown conversion

2015-03-10 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356242#comment-14356242
 ] 

Masatake Iwasaki commented on YARN-3295:


The patch is applicable for branch-2 too.

> Fix documentation nits found in markdown conversion
> ---
>
> Key: YARN-3295
> URL: https://issues.apache.org/jira/browse/YARN-3295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-3295.001.patch
>
>
> * In ResourceManagerRestart page - Inside the Notes, the "_e{epoch}_" , was 
> highlighted before but not now.
> * yarn container command
> {noformat}
> list ApplicationId (should be Application Attempt ID ?)
> Lists containers for the application attempt.
> {noformat}
> * yarn application attempt command
> {noformat}
> list ApplicationId
> Lists applications attempts from the RM (should be Lists applications 
> attempts for the given application)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-03-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356240#comment-14356240
 ] 

Tsuyoshi Ozawa commented on YARN-314:
-

[~grey] Feel free to assign to you!

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-314) Schedulers should allow resource requests of different sizes at the same priority and location

2015-03-10 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356238#comment-14356238
 ] 

Lei Guo commented on YARN-314:
--

Anybody looking at this one? Maybe I can check this 

> Schedulers should allow resource requests of different sizes at the same 
> priority and location
> --
>
> Key: YARN-314
> URL: https://issues.apache.org/jira/browse/YARN-314
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
> Attachments: yarn-314-prelim.patch
>
>
> Currently, resource requests for the same container and locality are expected 
> to all be the same size.
> While it it doesn't look like it's needed for apps currently, and can be 
> circumvented by specifying different priorities if absolutely necessary, it 
> seems to me that the ability to request containers with different resource 
> requirements at the same priority level should be there for the future and 
> for completeness sake.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356237#comment-14356237
 ] 

Xuan Gong commented on YARN-1884:
-

The result does not look right. Kick the Jenkins again

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356193#comment-14356193
 ] 

Junping Du commented on YARN-2928:
--

bq. let me know if you are OK with the name, and I can make a quick refactoring 
patch.
I have an outstanding patch in YARN-3039 for review now. [~sjlee0], can we 
defer the renaming work until that patch get in? Thx!

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-10 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-3039:
-
Attachment: YARN-3039-v4.patch

Upload v4 patch with necessary unit tests, especially end-to-end test. This 
patch is ready for review now. 
Something new since v3 patch:
- Add callback in AMRMClient (async) for aggregator address updating
- Add retry logic in TimelineClient for service discovery in v2 case 
- Non-blocking call in DistributedShell AM for put/post entities (for v2 case) 
so it won't block other core logic
- TimelineClient in v2 case won't get aggregator address from configuration but 
by auto discovery now. Verify it works end-to-end with TestDistributedShell.

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
> YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
> YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3295) Fix documentation nits found in markdown conversion

2015-03-10 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated YARN-3295:
---
Attachment: YARN-3295.001.patch

attaching patch for trunk.

> Fix documentation nits found in markdown conversion
> ---
>
> Key: YARN-3295
> URL: https://issues.apache.org/jira/browse/YARN-3295
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Trivial
> Attachments: YARN-3295.001.patch
>
>
> * In ResourceManagerRestart page - Inside the Notes, the "_e{epoch}_" , was 
> highlighted before but not now.
> * yarn container command
> {noformat}
> list ApplicationId (should be Application Attempt ID ?)
> Lists containers for the application attempt.
> {noformat}
> * yarn application attempt command
> {noformat}
> list ApplicationId
> Lists applications attempts from the RM (should be Lists applications 
> attempts for the given application)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-10 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356179#comment-14356179
 ] 

Junping Du commented on YARN-3039:
--

Hi [~sjlee0], thanks for comments above. I think we are on the same page now. 
Please check v2 proposal attached.

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> ---
>
> Key: YARN-3039
> URL: https://issues.apache.org/jira/browse/YARN-3039
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Junping Du
> Attachments: Service Binding for applicationaggregator of ATS 
> (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
> YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
> YARN-3039-v3-core-changes-only.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is 
> essential for off-node clients to send writes to the right ATS writer. This 
> should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356132#comment-14356132
 ] 

Sean Busbey commented on YARN-2784:
---

{quote}
I have a really hard time believing this patch broke all those tests. lol
{quote}

I think it's just that the test infra must not be robust enough for the test 
load. Most patches don't hit so many modules, so they're less likely to see as 
many random failures. (or maybe there's an error in the module ordering?)

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: 0002-YARN-2784.patch, YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3328) There's no way to rebuild containers Managed by NMClientAsync If AM restart

2015-03-10 Thread sandflee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sandflee updated YARN-3328:
---
Description: 
If work preserving is enabled and AM restart, AM could't stop containers  
launched by pre-am, because there's no corresponding container in 
NMClientAsync.containers.
 There‘s no way to rebuild NMClientAsync.containers.


  was:
If work preserving is enabled and AM restart, AM could't stop containers or 
query container status launched by pre-am, because there's no corresponding 
container in NMClientAsync.containers.
 There‘s no way to rebuild NMClientAsync.containers.



> There's no way to rebuild containers Managed by NMClientAsync If AM restart
> ---
>
> Key: YARN-3328
> URL: https://issues.apache.org/jira/browse/YARN-3328
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, applications, client
>Affects Versions: 2.6.0
>Reporter: sandflee
>
> If work preserving is enabled and AM restart, AM could't stop containers  
> launched by pre-am, because there's no corresponding container in 
> NMClientAsync.containers.
>  There‘s no way to rebuild NMClientAsync.containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356056#comment-14356056
 ] 

Vinod Kumar Vavilapalli commented on YARN-2280:
---

Why is this only committed to trunk? Why isn't it committed to branch-2, it's a 
compatible change? This makes release-manager's life extremely difficult.

> Resource manager web service fields are not accessible
> --
>
> Key: YARN-2280
> URL: https://issues.apache.org/jira/browse/YARN-2280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>Assignee: Krisztian Horvath
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: YARN-2280.patch
>
>
> Using the resource manager's rest api 
> (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
> rest call returns a class where the fields after the unmarshal cannot be 
> accessible. For example SchedulerTypeInfo -> schedulerInfo. Using the same 
> classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356039#comment-14356039
 ] 

Rohith commented on YARN-2784:
--

Kindly review updated patch.. 

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: 0002-YARN-2784.patch, YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356038#comment-14356038
 ] 

Hadoop QA commented on YARN-2784:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678486/YARN-2784.patch
  against trunk revision 64eb068.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs-plugins
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.mapred.TestMRTimelineEventHandling
  org.apache.hadoop.mapreduce.TestLargeSort
  org.apache.hadoop.mapreduce.v2.TestMRJobs

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs-plugins
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-pr

[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356035#comment-14356035
 ] 

Rohith commented on YARN-2784:
--

Below names are from updated patch.
{noformat}
[INFO] Apache Hadoop YARN  SUCCESS [  0.002 s]
[INFO] Apache Hadoop YARN API  SUCCESS [  0.003 s]
[INFO] Apache Hadoop YARN Common . SUCCESS [  0.005 s]
[INFO] Apache Hadoop YARN Server . SUCCESS [  0.003 s]
[INFO] Apache Hadoop YARN Server Common .. SUCCESS [  0.004 s]
[INFO] Apache Hadoop YARN NodeManager  SUCCESS [  0.005 s]
[INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [  0.005 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [  0.005 s]
[INFO] Apache Hadoop YARN ResourceManager  SUCCESS [  0.006 s]
[INFO] Apache Hadoop YARN Server Tests ... SUCCESS [  0.004 s]
[INFO] Apache Hadoop YARN Client . SUCCESS [  0.005 s]
[INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [  0.003 s]
[INFO] Apache Hadoop YARN Applications ... SUCCESS [  0.002 s]
[INFO] Apache Hadoop YARN DistributedShell ... SUCCESS [  0.004 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SUCCESS [  0.003 s]
[INFO] Apache Hadoop YARN Site ... SUCCESS [  0.002 s]
[INFO] Apache Hadoop YARN Registry ... SUCCESS [  0.003 s]
[INFO] Apache Hadoop YARN Project POM  SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce Client  SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce Core .. SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce Common  SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce Shuffle ... SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce App ... SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce HistoryServer . SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce JobClient . SUCCESS [  0.004 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins . SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce NativeTask  SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce Examples .. SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce ... SUCCESS [  0.003 s]
[INFO] Apache Hadoop MapReduce Streaming . SUCCESS [  0.004 s]
{noformat}

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: 0002-YARN-2784.patch, YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356034#comment-14356034
 ] 

Hadoop QA commented on YARN-1884:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703734/YARN-1884.3.patch
  against trunk revision aa92b76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs-plugins
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask
 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-unmanaged-am-launcher
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-registry 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:

  org.apache.hadoop.mapreduce.lib.input.TestLineRecordReader
  org.apache.hadoop.mapred.TestLineRecordReader
  
org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobsQuery
  org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp
  org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing
  org.apache.hadoop.mapreduce.v2.hs.webapp.TestHsWebServicesJobs
  org.apache.hadoop.mapreduce.v2.hs.webapp.dao.TestJobInfo
  org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryEntities
  org.apache.hadoop.mapred.TestClusterMRNotification
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels
  
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
  
org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher
  org.apache.hadoop.yarn.client.api.impl.TestAHSClient
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  org.apache.hadoop.yarn.client.cli.TestYarnCLI
  org.apache.hadoop.yarn.client.api.impl.TestYarnClient
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
 

Test results: 
https://builds.apache.org/job/PreCommit-

[jira] [Updated] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2784:
-
Attachment: 0002-YARN-2784.patch

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: 0002-YARN-2784.patch, YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-10 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-:
--
Attachment: YARN-.001.patch

Posted the patch. It addresses all the items mentioned in the description. The 
test-patch.sh script passes with +1's.

The only interesting renaming is for TimelineAggregatorsCollection. Since it 
already contains "Collection", I renamed it to TimelineCollectorsManager.

> rename TimelineAggregator etc. to TimelineCollector
> ---
>
> Key: YARN-
> URL: https://issues.apache.org/jira/browse/YARN-
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-.001.patch
>
>
> Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
> TimelineCollector, etc.
> There are also several minor issues on the current branch, which can be fixed 
> as part of this:
> - fixing some imports
> - missing license in TestTimelineServerClientIntegration.java
> - whitespaces
> - missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible

2015-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356006#comment-14356006
 ] 

Hudson commented on YARN-2280:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7301 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7301/])
YARN-2280. Resource manager web service fields are not accessible (Krisztian 
Horvath via aw) (aw: rev a5cf985bf501fd032124d121dcae80538db9e380)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodesInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/SchedulerTypeInfo.java
* hadoop-yarn-project/CHANGES.txt


> Resource manager web service fields are not accessible
> --
>
> Key: YARN-2280
> URL: https://issues.apache.org/jira/browse/YARN-2280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>Assignee: Krisztian Horvath
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: YARN-2280.patch
>
>
> Using the resource manager's rest api 
> (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
> rest call returns a class where the fields after the unmarshal cannot be 
> accessible. For example SchedulerTypeInfo -> schedulerInfo. Using the same 
> classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3333) rename TimelineAggregator etc. to TimelineCollector

2015-03-10 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-:
-

 Summary: rename TimelineAggregator etc. to TimelineCollector
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee


Per discussions on YARN-2928, let's rename TimelineAggregator, etc. to 
TimelineCollector, etc.

There are also several minor issues on the current branch, which can be fixed 
as part of this:
- fixing some imports
- missing license in TestTimelineServerClientIntegration.java
- whitespaces
- missing direct dependency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355995#comment-14355995
 ] 

Jian He commented on YARN-3243:
---

looks good overall, some  comments:
- {{AbstractCSQueue#getCurrentLimitResource}}
-- add comments about how currentLimitResource is calculated
- getResourceLimitsOfChild  
--  myLimits-> parentLimits
--  myMaxAvailableResource -> parentMaxAvailableResource 
--  childMaxResource -> childConfiguredMaxResource

- setHeadroomInfo -> setQueueResourceLimitsInfo
- needExtraNewOrReservedContainer flag -> better name ? 
shouldAllocOrReserveNewContainer?
- similarly for the needExtraNewOrReservedContainer method
- revert TestContainerAllocation change
- {{ 1GB (am) + 5GB * 2 = 9GB  }} 5GB should be 4GB
- Do you think passing down a QueueHeadRoom  compared with QueueMaxLimit may 
make the code simpler
-   the checkLimitsToReserve may not need to be invoked if we are assigning a 
reserved container
{code}
if (reservationsContinueLooking) {
//  // we got here by possibly ignoring parent queue capacity limits. If
//  // the parameter needToUnreserve is true it means we ignored one of
//  // those limits in the chance we could unreserve. If we are here
//  // we aren't trying to unreserve so we can't allocate
//  // anymore due to that parent limit
//  boolean res = checkLimitsToReserve(clusterResource,
{code}

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355993#comment-14355993
 ] 

Rohith commented on YARN-2784:
--

Thanks [~aw] for looking into this improvement!!
 I will update the patch for those minor issues

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2280) Resource manager web service fields are not accessible

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355991#comment-14355991
 ] 

Hadoop QA commented on YARN-2280:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12675016/YARN-2280.patch
  against trunk revision 64eb068.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6908//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6908//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6908//console

This message is automatically generated.

> Resource manager web service fields are not accessible
> --
>
> Key: YARN-2280
> URL: https://issues.apache.org/jira/browse/YARN-2280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>Assignee: Krisztian Horvath
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: YARN-2280.patch
>
>
> Using the resource manager's rest api 
> (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
> rest call returns a class where the fields after the unmarshal cannot be 
> accessible. For example SchedulerTypeInfo -> schedulerInfo. Using the same 
> classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3269) Yarn.nodemanager.remote-app-log-dir could not be configured to fully qualified path

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355966#comment-14355966
 ] 

Hadoop QA commented on YARN-3269:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12701264/YARN-3269.2.patch
  against trunk revision 64eb068.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6909//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6909//console

This message is automatically generated.

> Yarn.nodemanager.remote-app-log-dir could not be configured to fully 
> qualified path
> ---
>
> Key: YARN-3269
> URL: https://issues.apache.org/jira/browse/YARN-3269
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-3269.1.patch, YARN-3269.2.patch
>
>
> Log aggregation currently is always relative to the default file system, not 
> an arbitrary file system identified by URI. So we can't put an arbitrary 
> fully-qualified URI into yarn.nodemanager.remote-app-log-dir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355955#comment-14355955
 ] 

Zhijie Shen commented on YARN-2928:
---

bq.  let me know if you are OK with the name, and I can make a quick 
refactoring patch.

Sounds good to me.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355923#comment-14355923
 ] 

Lei Guo commented on YARN-3332:
---

To support customized resources, a quick list about areas we need consider
- resource definition, how NM/RM to understand the resource, this should be 
considered as Metrics based
- plug-in framework in NM/agent, 
   * interface for passing resource information between the plug-in and agent, 
this could be another RPC interface, so the plug-in can be based on any language
   * interface for loading/trigger plug-in (optional), the reason this 
interface as optional because the plug-in could be easy as cron job
- Sample resource collection plug-in for specific resource (or resource set), 
this could be some script or Java class depending on the plug-in framework 
design
- communication protocol between RM/NM to support customized resource

This topic is related to our proposal in June Hadoop Summit on multiple 
dimension scheduling.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2854) The document about timeline service and generic service needs to be updated

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355909#comment-14355909
 ] 

Zhijie Shen commented on YARN-2854:
---

Naga, thanks for the patch. Sorry for not responding you in time. Here're some 
high level comments about the patch:

1. Instead of saying "per-framework information", is it better to say 
"application-specific information"?

2. Current Status is not accurate. It's better to say the current version of 
timeline service is done: core functionality, security and builtin generic 
history service on top it, blah blah. And meanwhile, we're rolling out the 
timeline service next generation to scalable.

3. For the configurations, can we remove 
"yarn.timeline-service.generic-application-history.enabled" or modify the 
description. I know we needs it set to true to enable CLI to get the generic 
history data, but we shouldn't make RM to rely on it  to start sending the 
system metrics.

4. The bellow configs are more like the advanced configs.
{code}
76  | `yarn.timeline-service.ttl-enable` | Enable age off of timeline store 
data. Defaults to true. |
77  | `yarn.timeline-service.ttl-ms` | Time to live for timeline store data 
in milliseconds. Defaults to 60480 (7 days). |
78  | `yarn.timeline-service.handler-thread-count` | Handler thread count 
to serve the client RPC requests. Defaults to 10. |
79  | `yarn.timeline-service.client.max-retries` | Default maximum number 
of retires for timeline servive client. Defaults to 30. |
80  | `yarn.timeline-service.client.retry-interval-ms` | Default retry time 
interval for timeline servive client. Defaults to 1000. |
{code}

5. Can we rephrase the following sentence. "Enabling the timeline service and 
the generic history service"?
{code}
 Sample configuration for enabling Application History service and 
per-framework data by applications
{code}

> The document about timeline service and generic service needs to be updated
> ---
>
> Key: YARN-2854
> URL: https://issues.apache.org/jira/browse/YARN-2854
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Naganarasimha G R
>Priority: Critical
> Attachments: TimelineServer.html, YARN-2854.20141120-1.patch, 
> YARN-2854.20150128.1.patch, YARN-2854.20150304.1.patch, timeline_structure.jpg
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3306) [Umbrella] Proposing per-queue Policy driven scheduling in YARN

2015-03-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355902#comment-14355902
 ] 

Karthik Kambatla commented on YARN-3306:


I agree with the pain points of a fragmented scheduler state in YARN. 

Is the proposal only to add a policy for assigning resources in the leaf queue 
and leave the rest of the schedulers as is? If so, can you elaborate the 
advantages (usecases)? 

Or, is the proposal to arrive at a single scheduler implementation with the 
existing schedulers becoming just policies? If so, we might want to implement a 
new scheduler, may be in a new branch, and capture all the existing features 
and phase the two out. 

> [Umbrella] Proposing per-queue Policy driven scheduling in YARN
> ---
>
> Key: YARN-3306
> URL: https://issues.apache.org/jira/browse/YARN-3306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: PerQueuePolicydrivenschedulinginYARN.pdf
>
>
> Scheduling layout in Apache Hadoop YARN today is very coarse grained. This 
> proposal aims at converting today’s rigid scheduling in YARN to a per­-queue 
> policy driven architecture.
> We propose the creation of a c​ommon policy framework​ and implement a​common 
> set of policies​ that administrators can pick and chose per queue
>  - Make scheduling policies configurable per queue
>  - Initially, we limit ourselves to a new type of scheduling policy that 
> determines the ordering of applications within the l​eaf ­queue
>  - In the near future, we will also pursue parent­ queue level policies and 
> potential algorithm reuse through a separate type of policies that control 
> resource limits per queue, user, application etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355901#comment-14355901
 ] 

Li Lu commented on YARN-3332:
-

Hi [~grey], I think it's a nice idea. I think after YARN-2928, the timeline 
service layer would support this kind of usage (we're supporting "metrics" as a 
generic concept). What we need to do under this JIRA is to make the interface 
available on the NM level, I think? 

BTW, it would be cool to have GPU metrics. But I'm not sure if there are any 
general ways to gather this information. Would be helpful if you could 
elaborate a little bit more (if that's related to this JIRA). Thanks! 

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Lei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355891#comment-14355891
 ] 

Lei Guo commented on YARN-3332:
---

Any consideration to support plug-in for customized resource statistics 
collection in NM? We may need other type resource information for scheduling 
purpose later, for example, GPU related information. 

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3154) Should not upload partial logs for MR jobs or other "short-running' applications

2015-03-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355887#comment-14355887
 ] 

Xuan Gong commented on YARN-3154:
-

bq. do the long running applications such as HBase on YARN using Slider need to 
do anything to make sure that partial logs are uploaded?

[~sumitmohanty] Sorry for the late reply. Yes, we need to change some 
configurations/setting for ApplicationSubmissionContext.

Here is a scenario which can explain the purpose of this ticket:
In MapReduce, we will create stdout, stderr, and syslog for every containers. 
And since the MapReduce job is relatively short (compared with the long running 
applications), it does not make sense to upload those logs partially unless the 
users really want to.
So, the old include_pattern/exclude_pattern in ASC will be used to indicate 
which log files need to be aggregated explicitly at app finish.
and we introduce two additional parameter is ASC which is more related to long 
running applications, such as HBase on YARN.
{code}
rolled_logs_include_pattern 
rolled_logs_exclude_pattern
{code}
If we want the logs be uploaded (partial logs) while the app is running, we 
should use these two newly instroduced parameters.

For the  HBase on YARN using Slider case, after the patch, we need to switch 
the values from old include_pattern/exclude_pattern to new 
rolled_logs_include_pattern/rolled_logs_exclude_pattern

> Should not upload partial logs for MR jobs or other "short-running' 
> applications 
> -
>
> Key: YARN-3154
> URL: https://issues.apache.org/jira/browse/YARN-3154
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: YARN-3154.1.patch, YARN-3154.2.patch, YARN-3154.3.patch
>
>
> Currently, if we are running a MR job, and we do not set the log interval 
> properly, we will have their partial logs uploaded and then removed from the 
> local filesystem which is not right.
> We only upload the partial logs for LRS applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2015-03-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2280:
---
Issue Type: Improvement  (was: Bug)

> Resource manager web service fields are not accessible
> --
>
> Key: YARN-2280
> URL: https://issues.apache.org/jira/browse/YARN-2280
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>Assignee: Krisztian Horvath
>Priority: Trivial
> Attachments: YARN-2280.patch
>
>
> Using the resource manager's rest api 
> (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
> rest call returns a class where the fields after the unmarshal cannot be 
> accessible. For example SchedulerTypeInfo -> schedulerInfo. Using the same 
> classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355857#comment-14355857
 ] 

Hadoop QA commented on YARN-3243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703744/YARN-3243.3.patch
  against trunk revision 64eb068.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6906//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6906//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6906//console

This message is automatically generated.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records

2015-03-10 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355849#comment-14355849
 ] 

Jonathan Eagles commented on YARN-3267:
---

[~lichangleo], this patch looks good in general. A few minor things.

# Please consider changing CheckAcl constructor to take a ugi and remove ugi 
from check method
# Please change CheckAcl variable name tester to checkAcl
# Please cleanup the trailing white space in the patch

> Timelineserver applies the ACL rules after applying the limit on the number 
> of records
> --
>
> Key: YARN-3267
> URL: https://issues.apache.org/jira/browse/YARN-3267
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Prakash Ramachandran
>Assignee: Chang Li
> Attachments: YARN_3267_V1.patch, YARN_3267_V2.patch, 
> YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, 
> YARN_3267_WIP3.patch
>
>
> While fetching the entities from timelineserver, the limit is applied on the 
> entities to be fetched from leveldb, the ACL filters are applied after this 
> (TimelineDataManager.java::getEntities). 
> this could mean that even if there are entities available which match the 
> query criteria, we could end up not getting any results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2280) Resource manager web service fields are not accessible

2015-03-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2280:
---
Fix Version/s: (was: 2.7.0)

> Resource manager web service fields are not accessible
> --
>
> Key: YARN-2280
> URL: https://issues.apache.org/jira/browse/YARN-2280
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0, 2.4.1
>Reporter: Krisztian Horvath
>Assignee: Krisztian Horvath
>Priority: Trivial
> Attachments: YARN-2280.patch
>
>
> Using the resource manager's rest api 
> (org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices) some 
> rest call returns a class where the fields after the unmarshal cannot be 
> accessible. For example SchedulerTypeInfo -> schedulerInfo. Using the same 
> classes on client side these fields only accessible via reflection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2784) Make POM project names consistent

2015-03-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2784:
---
Summary: Make POM project names consistent  (was: Yarn project module names 
in POM needs to consistent across hadoop project)

> Make POM project names consistent
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2784) Yarn project module names in POM needs to consistent across hadoop project

2015-03-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-2784:
---
Summary: Yarn project module names in POM needs to consistent across hadoop 
project  (was: Yarn project module names in POM needs to consistent acros 
hadoop project)

> Yarn project module names in POM needs to consistent across hadoop project
> --
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project

2015-03-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355801#comment-14355801
 ] 

Allen Wittenauer commented on YARN-2784:


Minor issues:

Distributedshell -> DistributedShell
Am Luncher -> AM Launcher
ApplicationHistoryservice -> ApplicationHistoryService

I'm also tempted to say that it should be 'YARN' not Yarn.

> Yarn project module names in POM needs to consistent acros hadoop project
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355798#comment-14355798
 ] 

Karthik Kambatla commented on YARN-3332:


Thanks for filing this and working on the design, Vinod. I like the idea of a 
clean interface to get node and container resource usage info. 

Is there any reason why you think a service architecture is better than it 
being a common library? How much information is shared among the consumers of 
this interface? For instance, both HDFS and YARN would be interested in the 
availability and usage of CPU, memory, disk and network for the entire node. 
Isn't all other information of exclusive interest either? 

Have other questions/comments on the design, but will hold off until we decide 
on service vs library. 

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355771#comment-14355771
 ] 

Zhijie Shen commented on YARN-1884:
---

+1 LGTM, will commit the patch after Jenkins feedback.

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2015-03-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355766#comment-14355766
 ] 

Wangda Tan commented on YARN-3215:
--

One possible failed case is like 
https://issues.apache.org/jira/browse/YARN-2008. When we have hierarchy of 
labeled queues, CS can report incorrect headroom to AM which can cause 
problems. I haven't reproduced/tested this issue, but label-based headroom 
calculation is not supported now.

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355748#comment-14355748
 ] 

Hadoop QA commented on YARN-2368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12658387/YARN-2368.patch
  against trunk revision aa92b76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6904//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6904//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6904//console

This message is automatically generated.

> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Priority: Critical
> Attachments: YARN-2368.patch
>
>
> Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager (ip addr: 10.153.80.8) log shows as the following:
> {code}
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
>

[jira] [Commented] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project

2015-03-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355682#comment-14355682
 ] 

Allen Wittenauer commented on YARN-2784:


I have a really hard time believing this patch broke all those tests. lol

> Yarn project module names in POM needs to consistent acros hadoop project
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2784) Yarn project module names in POM needs to consistent acros hadoop project

2015-03-10 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355684#comment-14355684
 ] 

Allen Wittenauer commented on YARN-2784:


Fired off another jenkins run.  I'm doing a test build myself as well.

> Yarn project module names in POM needs to consistent acros hadoop project
> -
>
> Key: YARN-2784
> URL: https://issues.apache.org/jira/browse/YARN-2784
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Attachments: YARN-2784.patch
>
>
> All yarn and mapreduce pom.xml has project name has 
> hadoop-mapreduce/hadoop-yarn. This can be made consistent acros Hadoop 
> projects build like 'Apache Hadoop Yarn ' and 'Apache Hadoop 
> MapReduce ".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3243:
-
Attachment: YARN-3243.3.patch

Updated a patch fixed failure of WPRMRestart. Findbugs warnings are not related 
to this patch.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch, YARN-3243.2.patch, YARN-3243.3.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355608#comment-14355608
 ] 

Hadoop QA commented on YARN-3243:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703715/YARN-3243.2.patch
  against trunk revision aa92b76.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6903//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6903//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6903//console

This message is automatically generated.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch, YARN-3243.2.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-1884:

Attachment: YARN-1884.3.patch

Address all the comments

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch, YARN-1884.3.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2141) [Umbrella] Capture container and node resource consumption

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355590#comment-14355590
 ] 

Vinod Kumar Vavilapalli commented on YARN-2141:
---

bq. One other related effort is YARN-2928 which is also planning to obtain and 
send information about container resource-usage to a per-application 
aggregator. We should try to unify these..
Filed YARN-3332.

> [Umbrella] Capture container and node resource consumption
> --
>
> Key: YARN-2141
> URL: https://issues.apache.org/jira/browse/YARN-2141
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Carlo Curino
>Priority: Minor
>
> Collecting per-container and per-node resource consumption statistics in a 
> fairly granular manner, and making them available to both infrastructure code 
> (e.g., schedulers) and users (e.g., AMs or directly users via webapps), can 
> facilitate several performance work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2745) Extend YARN to support multi-resource packing of tasks

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355594#comment-14355594
 ] 

Vinod Kumar Vavilapalli commented on YARN-2745:
---

Filed YARN-3332 that should unify the stats collection on a NodeManager and 
help this feature too.

> Extend YARN to support multi-resource packing of tasks
> --
>
> Key: YARN-2745
> URL: https://issues.apache.org/jira/browse/YARN-2745
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager, scheduler
>Reporter: Robert Grandl
>Assignee: Robert Grandl
> Attachments: sigcomm_14_tetris_talk.pptx, tetris_design_doc.docx, 
> tetris_paper.pdf
>
>
> In this umbrella JIRA we propose an extension to existing scheduling 
> techniques, which accounts for all resources used by a task (CPU, memory, 
> disk, network) and it is able to achieve three competing objectives: 
> fairness, improve cluster utilization and reduces average job completion time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355587#comment-14355587
 ] 

Vinod Kumar Vavilapalli commented on YARN-2928:
---

bq. Overall I'd like to push other efforts like YARN-2141, YARN-1012 to fit 
into the current architecture being proposed in this JIRA. This is so that we 
don't duplicate stats collection between efforts.
Filed YARN-3332.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355583#comment-14355583
 ] 

Vinod Kumar Vavilapalli commented on YARN-3332:
---

Linking related tickets that can leverage this: YARN-2928, YARN-2745.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3332) [Umbrella] Unified Resource Statistics Collection per node

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-3332:
--
Attachment: Design - UnifiedResourceStatisticsCollection.pdf

Attaching the proposal doc. Feedback appreciated.

> [Umbrella] Unified Resource Statistics Collection per node
> --
>
> Key: YARN-3332
> URL: https://issues.apache.org/jira/browse/YARN-3332
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: Design - UnifiedResourceStatisticsCollection.pdf
>
>
> Today in YARN, NodeManager collects statistics like per container resource 
> usage and overall physical resources available on the machine. Currently this 
> is used internally in YARN by the NodeManager for only a limited usage: 
> automatically determining the capacity of resources on node and enforcing 
> memory usage to what is reserved per container.
> This proposal is to extend the existing architecture and collect statistics 
> for usage b​eyond​ the existing use­cases.
> Proposal attached in comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni

2015-03-10 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1437#comment-1437
 ] 

Anubhav Dhoot commented on YARN-3331:
-

As per 
http://grepcode.com/file/repo1.maven.org/maven2/org.fusesource.leveldbjni/leveldbjni-all/1.8/org/fusesource/hawtjni/runtime/Library.java
 if we define the property library.${name}.path we can avoid it using the 
temporary directory. 
{noformat}
The file extraction is attempted until it succeeds in the following directories.
1. The directory pointed to by the "library.${name}.path" System property (if 
set)
2. a temporary directory (uses the "java.io.tmpdir" System property)
{noformat}

So we can fix this by setting -Dlibrary.leveldbjni.path=$(pwd) in the 
nodemanager options.

> NodeManager should use directory other than tmp for extracting and loading 
> leveldbjni
> -
>
> Key: YARN-3331
> URL: https://issues.apache.org/jira/browse/YARN-3331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> /tmp can be  required to be noexec in many environments. This causes a 
> problem when  nodemanager tries to load the leveldbjni library which can get 
> unpacked and executed from /tmp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni

2015-03-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3331:

Component/s: nodemanager

> NodeManager should use directory other than tmp for extracting and loading 
> leveldbjni
> -
>
> Key: YARN-3331
> URL: https://issues.apache.org/jira/browse/YARN-3331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> /tmp can be  required to be noexec in many environments. This causes a 
> problem when  nodemanager tries to load the leveldbjni library which can get 
> unpacked and executed from /tmp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3331) NodeManager should use directory other than tmp for extracting and loading leveldbjni

2015-03-10 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot moved MAPREDUCE-6272 to YARN-3331:


Key: YARN-3331  (was: MAPREDUCE-6272)
Project: Hadoop YARN  (was: Hadoop Map/Reduce)

> NodeManager should use directory other than tmp for extracting and loading 
> leveldbjni
> -
>
> Key: YARN-3331
> URL: https://issues.apache.org/jira/browse/YARN-3331
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> /tmp can be  required to be noexec in many environments. This causes a 
> problem when  nodemanager tries to load the leveldbjni library which can get 
> unpacked and executed from /tmp.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2015-03-10 Thread David Morel (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355471#comment-14355471
 ] 

David Morel commented on YARN-2368:
---

Passing "-Djute.maxbuffer=" in the startup scripts environment (in 
/etc/hadoop/conf/yarn-env.sh or  /etc/default/hadoop-yarn-resourcemanager) to 
the YARN_RESOURCEMANAGER_OPTS variable does the trick. It's picked up by the RM 
binary and effective.

> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Priority: Critical
> Attachments: YARN-2368.patch
>
>
> Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager (ip addr: 10.153.80.8) log shows as the following:
> {code}
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Meanwhile, ZooKeeps log shows as the following:
> {code}
> 2014-07-25 22:10:09,728 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
> attempting to renew session 0x247684586e70006 at /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating 
> client: 0x247684586e70006
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
> 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
> packet /10.153.80.8:58890
> 2014-07-25 22:10:09,730 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth 
> success /10.153.80.8:58890
> 2014-07-25 22:10:09,742 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
> causing c

[jira] [Updated] (YARN-3243) CapacityScheduler should pass headroom from parent to children to make sure ParentQueue obey its capacity limits.

2015-03-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3243:
-
Attachment: YARN-3243.2.patch

Attached a new patch addressed [~jianhe]'s comments.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -
>
> Key: YARN-3243
> URL: https://issues.apache.org/jira/browse/YARN-3243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3243.1.patch, YARN-3243.2.patch
>
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
> A  (usage=54, max=55)
>/ \
>   A1 A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355448#comment-14355448
 ] 

Vinod Kumar Vavilapalli commented on YARN-2495:
---

bq. I'm wondering if we might introduce a situation where a script error or 
other configuration issue could bring down an entire cluster 
Exactly my concern too.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355436#comment-14355436
 ] 

Wangda Tan commented on YARN-2495:
--

Hi Naga,
--
bq. Well modifications side is clear to me but is it good to allow the 
configurations being different from NM and RM ? Infact i wanted to discuss 
regarding whether to send shutdown during register if NM is configured 
differently from RM, but waited for the base changes to go in before discussing 
new stuff.

It is not make the configuration different, my thinking of this is, NM doesn't 
need understand what is "distribtued-configuration", admin should know it. When 
the node-label is "distributed-configuration", NM should go ahead and properly 
configure script provider, etc.
So we aren't trying to create different, we just eliminate one option in NM 
side.

--
bq.  I feel better to Log this as "Error" as we are sending the labels only in 
case of any change and there has to be some way to identify if labels for a 
given NM and also currently we are sending out shutdown signal too.

What I meant are the two line,
{code}
498   LOG.info("Node Labels {" + StringUtils.join(",", nodeLabels)
499   + "} from Node " + nodeId + " were Accepted from RM");
{code}
I guess you may misread my comment

--
For the field to indicate if node labels are set in 
NodeHeartbeatRequest/NodeRegistrationRequest, there're two proposals: 
- setAreNodeLabelsSetInReq
- setAreNodeLabelsUpdated
Which one you prefer, Vinod/Craig? I vote the latter one :)

--
bq. We should not even accept a node's registration when it reports invalid 
labels
IIUC, now the patch already reject node when it reports invalid labels.

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355437#comment-14355437
 ] 

Wangda Tan commented on YARN-2495:
--

Actually I've thought about this before, but since the node-labels.enabled is 
already shipped with Hadoop 2.7, we cannot change this. 

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2015-03-10 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355420#comment-14355420
 ] 

Sidharta Seethana commented on YARN-2140:
-

[~bikassaha] Thanks. I ran into this paper (and a couple of others) when 
looking at [YARN-3|https://issues.apache.org/jira/browse/YARN-3]  

> Add support for network IO isolation/scheduling for containers
> --
>
> Key: YARN-2140
> URL: https://issues.apache.org/jira/browse/YARN-2140
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: NetworkAsAResourceDesign.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1884) ContainerReport should have nodeHttpAddress

2015-03-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355392#comment-14355392
 ] 

Zhijie Shen commented on YARN-1884:
---

Looks good to me overall. In the test case, can we change the test string from 
host:port to http://host:port. It's not because the  current string is not 
right to pass the test, but I hope it could be more representative of a real 
http node address.

> ContainerReport should have nodeHttpAddress
> ---
>
> Key: YARN-1884
> URL: https://issues.apache.org/jira/browse/YARN-1884
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
> Attachments: YARN-1884.1.patch, YARN-1884.2.patch
>
>
> In web UI, we're going to show the node, which used to be to link to the NM 
> web page. However, on AHS web UI, and RM web UI after YARN-1809, the node 
> field has to be set to nodeID where the container is allocated. We need to 
> add nodeHttpAddress to the containerReport to link users to NM web page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355397#comment-14355397
 ] 

Hudson commented on YARN-3187:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7298 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7298/])
YARN-3187. Documentation of Capacity Scheduler Queue mapping based on user or 
group. Contributed by Gururaj Shetty (jianhe: rev 
a380643d2044a4974e379965f65066df2055d003)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/CapacityScheduler.md
* hadoop-yarn-project/CHANGES.txt


> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Fix For: 2.7.0
>
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, 
> YARN-3187.4.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-03-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355383#comment-14355383
 ] 

Wangda Tan commented on YARN-2498:
--

Hi [~eepayne],
Thanks for review but actually this patch is a little out-of-dated, and 
[~mayank_bansal] is working on a new patch, which will consider YARN-3214 and 
some other issues. Hope you can help with code review when Mayank attaches new 
patch.

Sorry for this, reassigned to Mayank and mark this is in-progress to avoid 
confusion.

Wangda

> Respect labels in preemption policy of capacity scheduler
> -
>
> Key: YARN-2498
> URL: https://issues.apache.org/jira/browse/YARN-2498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
> Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2015-03-10 Thread Sidharta Seethana (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355377#comment-14355377
 ] 

Sidharta Seethana commented on YARN-2140:
-

You are right - there are several areas to think about here and we definitely 
need to put in more thought w.r.t scheduling. In order to be able to do 
effective scheduling for network resources, we would need to understand a) the 
overall network topology in place for the cluster in question - characteristics 
of the ‘route’ between any two nodes in the cluster - number of hops required 
and the available/max bandwidth at each point in the route. b) application 
characteristics w.r.t network utilization - internal/external traffic, latency 
vs. bandwidth sensitivities etc. With regards to inbound traffic, we currently 
do not have a good way to do effectively manage traffic - when inbound packets 
are being ‘examined’ on a given node, they have already consumed bandwidth 
along the way - and the only option we have is to drop it immediately (we 
cannot queue on the inbound side) or let it through - the design document 
mentions these limitations. One possible approach here could be to let the 
application provide ‘hints’  for inbound network utilization (not all 
applications might be able to do this) and use this information purely for 
scheduling purposes. This, of course, adds more complexity to scheduling. 

Needless to say, there are hard problems to solve here - and the (network) 
scheduling requirements (and potential approaches for implementation) will need 
further looking into. As a first step, though, I think it makes sense to focus 
on classification of outbound traffic (net_cls) and maybe basic 
isolation/enforcement + collection of metrics. Once we have this in place - we 
could look at real utilization patterns and decide what the next steps should 
be. 




> Add support for network IO isolation/scheduling for containers
> --
>
> Key: YARN-2140
> URL: https://issues.apache.org/jira/browse/YARN-2140
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: NetworkAsAResourceDesign.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-03-10 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2498:
-
Assignee: Mayank Bansal  (was: Wangda Tan)

> Respect labels in preemption policy of capacity scheduler
> -
>
> Key: YARN-2498
> URL: https://issues.apache.org/jira/browse/YARN-2498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
> Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-03-10 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355361#comment-14355361
 ] 

Eric Payne commented on YARN-2498:
--

Hi [~leftnoteasy]. Great job on this patch. I have one minor nit:

Would you mind changing {{duductAvailableResourceAccordingToLabel}} to 
{{deductAvailableResourceAccordingToLabel}}? That is, {{duduct...}} should be 
{{deduct...}}.

> Respect labels in preemption policy of capacity scheduler
> -
>
> Key: YARN-2498
> URL: https://issues.apache.org/jira/browse/YARN-2498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355359#comment-14355359
 ] 

Hadoop QA commented on YARN-3187:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703641/YARN-3187.4.patch
  against trunk revision 20b8ee1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6902//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6902//console

This message is automatically generated.

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, 
> YARN-3187.4.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3330) Implement a protobuf compatibility checker to check if a patch breaks the compatibility with existing client and internal protocols

2015-03-10 Thread Li Lu (JIRA)
Li Lu created YARN-3330:
---

 Summary: Implement a protobuf compatibility checker to check if a 
patch breaks the compatibility with existing client and internal protocols
 Key: YARN-3330
 URL: https://issues.apache.org/jira/browse/YARN-3330
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Li Lu
Assignee: Li Lu


Per YARN-3292, we may want to start YARN rolling upgrade test compatibility 
verification tool by a simple script to check protobuf compatibility. The 
script may work on incoming patch files, check if there are any changes to 
protobuf files, and report any potentially incompatible changes (line removals, 
etc,.). We may want the tool to be conservative: it may report false positives, 
but we should minimize its chance to have false negatives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3187) Documentation of Capacity Scheduler Queue mapping based on user or group

2015-03-10 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355300#comment-14355300
 ] 

Jian He commented on YARN-3187:
---

looks good, +1

> Documentation of Capacity Scheduler Queue mapping based on user or group
> 
>
> Key: YARN-3187
> URL: https://issues.apache.org/jira/browse/YARN-3187
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, documentation
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Gururaj Shetty
>  Labels: documentation
> Attachments: YARN-3187.1.patch, YARN-3187.2.patch, YARN-3187.3.patch, 
> YARN-3187.4.patch
>
>
> YARN-2411 exposes a very useful feature {{support simple user and group 
> mappings to queues}} but its not captured in the documentation. So in this 
> jira we plan to document this feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2015-03-10 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355283#comment-14355283
 ] 

Naganarasimha G R commented on YARN-2495:
-

Hi [~wangda] & [~vinodkv],
Also one query/suggestion, would it be better to have single configuration for 
node labels
yarn.node-labels.configuration.type= (with 
default as disabled) instead of currently available 2 configurations i.e. 
"yarn.node-labels.enabled" and "yarn.node-labels.configuration.type", so that 
we can avoid one more configuration ?

> Allow admin specify labels from each NM (Distributed configuration)
> ---
>
> Key: YARN-2495
> URL: https://issues.apache.org/jira/browse/YARN-2495
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
> Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
> YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
> YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
> YARN-2495.20141204-1.patch, YARN-2495.20141208-1.patch, 
> YARN-2495.20150305-1.patch, YARN-2495.20150309-1.patch, 
> YARN-2495_20141022.1.patch
>
>
> Target of this JIRA is to allow admin specify labels in each NM, this covers
> - User can set labels in each NM (by setting yarn-site.xml (YARN-2923) or 
> using script suggested by [~aw] (YARN-2729) )
> - NM will send labels to RM via ResourceTracker API
> - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2928) Application Timeline Server (ATS) next gen: phase 1

2015-03-10 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355203#comment-14355203
 ] 

Sangjin Lee commented on YARN-2928:
---

I like the name TimelineCollector. [~zjshen], [~vinodkv], let me know if you 
are OK with the name, and I can make a quick refactoring patch.

> Application Timeline Server (ATS) next gen: phase 1
> ---
>
> Key: YARN-2928
> URL: https://issues.apache.org/jira/browse/YARN-2928
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>Priority: Critical
> Attachments: ATSv2.rev1.pdf, ATSv2.rev2.pdf, Data model proposal 
> v1.pdf, Timeline Service Next Gen - Planning - ppt.pptx
>
>
> We have the application timeline server implemented in yarn per YARN-1530 and 
> YARN-321. Although it is a great feature, we have recognized several critical 
> issues and features that need to be addressed.
> This JIRA proposes the design and implementation changes to address those. 
> This is phase 1 of this effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-10 Thread Mit Desai (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355155#comment-14355155
 ] 

Mit Desai commented on YARN-2890:
-

I was not aware of this Jira being reopened. I will take a look in a day or two.

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2015-03-10 Thread patrick white (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355145#comment-14355145
 ] 

patrick white commented on YARN-3215:
-

Hi, i am trying to reproduce the fail case as part of Label feature 
verification for our usecases, and so far the headroom calculation appears to 
behave correctly. Would it be possible to provide a specific fail scenario for 
this issue?

There were a number of challenges getting the yarn-site and capacity properties 
correctly set, we believe we have those in place now. So with both cases of 
jobs running on labelled and non-labelled resources, we are seeing the task 
execution staying on the correct nodes (a labelled job will only task out to 
matching-labelled nodes, a non-labelled job will not task to labelled nodes) 
and the headroom calc from AM logs show headroom memory dropping to 0 within 5 
seconds of job start. This is observed even with small-capacity run queues for 
the jobs and having 'slowstart' set to 0.


> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3325) [JDK8] build failed with JDK 8 with error: package org.apache.hadoop.yarn.util has already been annotated

2015-03-10 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved YARN-3325.

Resolution: Duplicate

> [JDK8] build failed with JDK 8 with error: package 
> org.apache.hadoop.yarn.util has already been annotated
> -
>
> Key: YARN-3325
> URL: https://issues.apache.org/jira/browse/YARN-3325
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.6.0
>Reporter: zhubin
>  Labels: build, maven
>
> [ERROR] 
> /root/bigtop/build/hadoop/rpm/BUILD/hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/package-info.java:18:
>  error: package org.apache.hadoop.yarn.util has already been annotated
> [ERROR] @InterfaceAudience.Public
> [ERROR] ^
> [ERROR] java.lang.AssertionError
> [ERROR] at com.sun.tools.javac.util.Assert.error(Assert.java:126)
> [ERROR] at com.sun.tools.javac.util.Assert.check(Assert.java:45)
> [ERROR] at 
> com.sun.tools.javac.code.SymbolMetadata.setDeclarationAttributesWithCompletion(SymbolMetadata.java:161)
> [ERROR] at 
> com.sun.tools.javac.code.Symbol.setDeclarationAttributesWithCompletion(Symbol.java:215)
> [ERROR] at 
> com.sun.tools.javac.comp.MemberEnter.actualEnterAnnotations(MemberEnter.java:952)
> [ERROR] at 
> com.sun.tools.javac.comp.MemberEnter.access$600(MemberEnter.java:64)
> [ERROR] at com.sun.tools.javac.comp.MemberEnter$5.run(MemberEnter.java:876)
> [ERROR] at com.sun.tools.javac.comp.Annotate.flush(Annotate.java:143)
> [ERROR] at com.sun.tools.javac.comp.Annotate.enterDone(Annotate.java:129)
> [ERROR] at com.sun.tools.javac.comp.Enter.complete(Enter.java:512)
> [ERROR] at com.sun.tools.javac.comp.Enter.main(Enter.java:471)
> [ERROR] at com.sun.tools.javadoc.JavadocEnter.main(JavadocEnter.java:78)
> [ERROR] at 
> com.sun.tools.javadoc.JavadocTool.getRootDocImpl(JavadocTool.java:186)
> [ERROR] at com.sun.tools.javadoc.Start.parseAndExecute(Start.java:346)
> [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:219)
> [ERROR] at com.sun.tools.javadoc.Start.begin(Start.java:205)
> [ERROR] at com.sun.tools.javadoc.Main.execute(Main.java:64)
> [ERROR] at com.sun.tools.javadoc.Main.main(Main.java:54)
> [ERROR] javadoc: error - fatal error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so

2015-03-10 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355127#comment-14355127
 ] 

Tsuyoshi Ozawa commented on YARN-2890:
--

{code}
   public MiniYARNCluster(
   String testName, int numResourceManagers, int numNodeManagers,
-  int numLocalDirs, int numLogDirs, boolean enableAHS) {
+  int numLocalDirs, int numLogDirs) {
{code}

[~mitdesai], I think we can keep the backward compatibility if we have both of 
constructors. Do you mind updating a patch to have both of them?

> MiniMRYarnCluster should turn on timeline service if configured to do so
> 
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.

2015-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355076#comment-14355076
 ] 

Hudson commented on YARN-3287:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2078 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2078/])
YARN-3287. Made TimelineClient put methods do as the correct login context. 
Contributed by Daryn Sharp and Jonathan Eagles. (zjshen: rev 
d6e05c5ee26feefc17267b7c9db1e2a3dbdef117)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/api/impl/TimelineClientImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/security/TestTimelineAuthenticationFilter.java


> TimelineClient kerberos authentication failure uses wrong login context.
> 
>
> Key: YARN-3287
> URL: https://issues.apache.org/jira/browse/YARN-3287
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Daryn Sharp
> Fix For: 2.7.0
>
> Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, 
> timeline.patch
>
>
> TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause 
> failure for yarn clients to create timeline domains during job submission.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3300) outstanding_resource_requests table should not be shown in AHS

2015-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355072#comment-14355072
 ] 

Hudson commented on YARN-3300:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2078 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2078/])
YARN-3300. Outstanding_resource_requests table should not be shown in AHS. 
Contributed by Xuan Gong (jianhe: rev c3003eba6f9802f15699564a5eb7c6e34424cb14)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppAttemptPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppAttemptBlock.java
* hadoop-yarn-project/CHANGES.txt


> outstanding_resource_requests table should not be shown in AHS
> --
>
> Key: YARN-3300
> URL: https://issues.apache.org/jira/browse/YARN-3300
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Fix For: 2.7.0
>
> Attachments: YARN-3300.1.patch, YARN-3300.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >