[jira] [Commented] (YARN-1068) Add admin support for HA operations

2013-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787739#comment-13787739
 ] 

Hadoop QA commented on YARN-1068:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607094/yarn-1068-9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2135//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2135//console

This message is automatically generated.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1277) Add http policy support for YARN daemons

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787734#comment-13787734
 ] 

Hudson commented on YARN-1277:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4554 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4554/])
YARN-1277. Added a policy based configuration for http/https in common 
HttpServer and using the same in YARN - related
to per project https config support via HADOOP-10022. Contributed by Suresh 
Srinivas and Omkar Vinit Joshi. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529662)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpConfig.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/http/TestSSLHttpServer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/jobhistory/JHAdminConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRWebAppUtil.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRConfig.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/WebAppUtils.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/ProxyUriUtils.java


> Add http policy support for YARN daemons
> 
>
> Key: YARN-1277
> URL: https://issues.apache.org/jira/browse/YARN-1277
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.0.0-alpha
>Reporter: Suresh Srinivas
>Assignee: Omkar Vinit Joshi
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1277.20131005.1.patch, YARN-1277.20131005.2.patch, 
> YARN-1277.20131005.3.patch, YARN-1277.patch
>
>
> This YARN part of HADOOP-10022.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1149) NM throws InvalidStateTransitonException: Invalid event: APPLICATION_LOG_HANDLING_FINISHED at RUNNING

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787727#comment-13787727
 ] 

Hudson commented on YARN-1149:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4553 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4553/])
YARN-1278. Fixed NodeManager to not delete local resources for apps on resync 
command from RM - a bug caused by YARN-1149. Contributed by Hitesh Shah. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529657)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedContainersEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> NM throws InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> -
>
> Key: YARN-1149
> URL: https://issues.apache.org/jira/browse/YARN-1149
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Ramya Sunil
>Assignee: Xuan Gong
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1149.1.patch, YARN-1149.2.patch, YARN-1149.3.patch, 
> YARN-1149.4.patch, YARN-1149.5.patch, YARN-1149.6.patch, YARN-1149.7.patch, 
> YARN-1149.8.patch, YARN-1149.9.patch, YARN-1149_branch-2.1-beta.1.patch
>
>
> When nodemanager receives a kill signal when an application has finished 
> execution but log aggregation has not kicked in, 
> InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING is thrown
> {noformat}
> 2013-08-25 20:45:00,875 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(254)) - Application just 
> finished : application_1377459190746_0118
> 2013-08-25 20:45:00,876 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(105)) - Starting aggregate 
> log-file for app application_1377459190746_0118 at 
> /app-logs/foo/logs/application_1377459190746_0118/_45454.tmp
> 2013-08-25 20:45:00,876 INFO  logaggregation.LogAggregationService 
> (LogAggregationService.java:stopAggregators(151)) - Waiting for aggregation 
> to complete for application_1377459190746_0118
> 2013-08-25 20:45:00,891 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:uploadLogsForContainer(122)) - Uploading logs for 
> container container_1377459190746_0118_01_04. Current good log dirs are 
> /tmp/yarn/local
> 2013-08-25 20:45:00,915 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doAppLogAggregation(182)) - Finished aggregate 
> log-file for app application_1377459190746_0118
> 2013-08-25 20:45:00,925 WARN  application.Application 
> (ApplicationImpl.java:handle(427)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FINISHED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>  
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:425)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:59)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:697)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$Applic

[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787725#comment-13787725
 ] 

Hudson commented on YARN-1278:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4553 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4553/])
YARN-1278. Fixed NodeManager to not delete local resources for apps on resync 
command from RM - a bug caused by YARN-1149. Contributed by Hitesh Shah. 
(vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529657)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/CMgrCompletedContainersEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeManagerResync.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java


> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch, 
> YARN-1278.trunk.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently

2013-10-06 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-1281:
--

 Summary: TestZKRMStateStoreZKClientConnections fails intermittently
 Key: YARN-1281
 URL: https://issues.apache.org/jira/browse/YARN-1281
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


The test fails intermittently - haven't been able to reproduce the failure 
deterministically. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1068) Add admin support for HA operations

2013-10-06 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1068:
---

Attachment: yarn-1068-9.patch

Patch to fix the javadoc and findbugs issues. The test failures seem unrelated.

> Add admin support for HA operations
> ---
>
> Key: YARN-1068
> URL: https://issues.apache.org/jira/browse/YARN-1068
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>  Labels: ha
> Attachments: yarn-1068-1.patch, yarn-1068-2.patch, yarn-1068-3.patch, 
> yarn-1068-4.patch, yarn-1068-5.patch, yarn-1068-6.patch, yarn-1068-7.patch, 
> yarn-1068-8.patch, yarn-1068-9.patch, yarn-1068-prelim.patch
>
>
> Support HA admin operations to facilitate transitioning the RM to Active and 
> Standby states.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787717#comment-13787717
 ] 

Hadoop QA commented on YARN-1278:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12607092/YARN-1278.trunk.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2134//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2134//console

This message is automatically generated.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch, 
> YARN-1278.trunk.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1278:
--

Attachment: YARN-1278.trunk.2.patch

Trunk patch.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch, 
> YARN-1278.trunk.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787697#comment-13787697
 ] 

Hadoop QA commented on YARN-1278:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607087/YARN-1278.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2133//console

This message is automatically generated.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787696#comment-13787696
 ] 

Hadoop QA commented on YARN-1278:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607087/YARN-1278.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2132//console

This message is automatically generated.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1278:
--

Attachment: YARN-1278.2.patch

Same patch but
 - removes the switches on cleanup-containers and cleanup-apps
 - adds a code comment and
 - cleans up the test to explicitly validate the existence of the app.

The test in the previous patch failed without the main code changes and passed 
with. So we are good.

Will check this in if Jenkins says okay too.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch, YARN-1278.2.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787689#comment-13787689
 ] 

Vinod Kumar Vavilapalli commented on YARN-1278:
---

bq.  For the purposes of this jira NM should delete resources on resync.
I suppose you mean 'shouldn't delete'? Time for a new keyboard!

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787680#comment-13787680
 ] 

Bikas Saha commented on YARN-1278:
--

(Keyboard messed up)
Even restart shouldn't delete existing resources since thats an expensive loss. 
We should be able to reconstruct the cache state from the previously downloaded 
files. However, thats probably a big change.
For the purposes of this jira NM should delete resources on resync.

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1278) New AM does not start after rm restart

2013-10-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787678#comment-13787678
 ] 

Bikas Saha commented on YARN-1278:
--

bq. I think on resync, we shouldn't destroy app resources. That is desired 
anyways as there is no need to just relocalize everything because of RM resync.
That is the way it should be. Resync is different from restart. Even restart 
shouldnt

> New AM does not start after rm restart
> --
>
> Key: YARN-1278
> URL: https://issues.apache.org/jira/browse/YARN-1278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.1-beta
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
>Priority: Blocker
> Attachments: YARN-1278.1.patch
>
>
> The new AM fails to start after RM restarts. It fails to start new 
> Application master and job fails with below error.
>  /usr/bin/mapred job -status job_1380985373054_0001
> 13/10/05 15:04:04 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname
> Job: job_1380985373054_0001
> Job File: /user/abc/.staging/job_1380985373054_0001/job.xml
> Job Tracking URL : 
> http://hostname:8088/cluster/app/application_1380985373054_0001
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters: 0



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-465) fix coverage org.apache.hadoop.yarn.server.webproxy

2013-10-06 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787658#comment-13787658
 ] 

Ravi Prakash commented on YARN-465:
---

Thanks for the updates and answers Andrey!
I could not find a WebAppProxy.start() method. I don't see the problem in 
calling the main method from tests. I would prefer that more than changing 
src/main code to suit tests. Removing the proxy.join() method makes the main() 
thread not wait for the proxy server thread to exit. I guess that could also be 
another way to do things but I am just wary of changing src/main code without 
good reason for the purpose of writing unit tests.
Please correct me if I am wrong, port will always be 0 at the time it is being 
printed out in the last Log.info in the start() method.
I am satisfied with the other changes. Thanks a lot for them. I am not going to 
be a stickler for the above issues, so I'm going to give the trunk patch a +1.
branch-2 still has the "answer=3" option, and maybe we can add it to trunk or 
remove it from branch-2. What do you think?

> fix coverage  org.apache.hadoop.yarn.server.webproxy
> 
>
> Key: YARN-465
> URL: https://issues.apache.org/jira/browse/YARN-465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0, 0.23.7, 2.0.4-alpha
>Reporter: Aleksey Gorshkov
>Assignee: Andrey Klochkov
> Attachments: YARN-465-branch-0.23-a.patch, 
> YARN-465-branch-0.23.patch, YARN-465-branch-2-a.patch, 
> YARN-465-branch-2--n3.patch, YARN-465-branch-2--n4.patch, 
> YARN-465-branch-2.patch, YARN-465-trunk-a.patch, YARN-465-trunk--n3.patch, 
> YARN-465-trunk--n4.patch, YARN-465-trunk.patch
>
>
> fix coverage  org.apache.hadoop.yarn.server.webproxy
> patch YARN-465-trunk.patch for trunk
> patch YARN-465-branch-2.patch for branch-2
> patch YARN-465-branch-0.23.patch for branch-0.23
> There is issue in branch-0.23 . Patch does not creating .keep file.
> For fix it need to run commands:
> mkdir 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy
> touch 
> yhadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/proxy/.keep
>  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787613#comment-13787613
 ] 

Hudson commented on YARN-1274:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1570 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1570/])
YARN-1274. Fixed NodeManager's LinuxContainerExecutor to create user, app-dir 
and log-dirs correctly even when there are no resources to localize for the 
container. Contributed by Siddharth Seth. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> LCE fails to run containers that don't have resources to localize
> -
>
> Key: YARN-1274
> URL: https://issues.apache.org/jira/browse/YARN-1274
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Alejandro Abdelnur
>Assignee: Siddharth Seth
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
> YARN-1274.trunk.2.txt
>
>
> LCE container launch assumes the usercache/USER directory exists and it is 
> owned by the user running the container process.
> But the directory is created only if there are resources to localize by the 
> LCE localization command, if there are not resourcdes to localize, LCE 
> localization never executes and launching fails reporting 255 exit code and 
> the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
> provided 1
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> llama
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
> directory llama in 
> /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
>  - Permission denied
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787608#comment-13787608
 ] 

Hudson commented on YARN-1268:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1570 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1570/])
Fix location of YARN-1268 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529531)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1268. TestFairScheduer.testContinuousScheduling is flaky (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529529)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> TestFairScheduler.testContinuousScheduling is flaky
> ---
>
> Key: YARN-1268
> URL: https://issues.apache.org/jira/browse/YARN-1268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.3.0
>
> Attachments: YARN-1268-1.patch, YARN-1268.patch
>
>
> It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787610#comment-13787610
 ] 

Hudson commented on YARN-1032:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1570 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1570/])
YARN-1032. Fixed NPE in RackResolver. Contributed by Lohit Vijayarenu. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529534)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestRackResolver.java


> NPE in RackResolve
> --
>
> Key: YARN-1032
> URL: https://issues.apache.org/jira/browse/YARN-1032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.5-alpha
> Environment: linux
>Reporter: Lohit Vijayarenu
>Assignee: Lohit Vijayarenu
>Priority: Critical
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1032.1.patch, YARN-1032.2.patch, YARN-1032.3.patch
>
>
> We found a case where our rack resolve script was not returning rack due to 
> problem with resolving host address. This exception was see in 
> RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
> {noformat}
> 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
> CONTACTING RM. 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
>   at 
> org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
>   at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787611#comment-13787611
 ] 

Hudson commented on YARN-1090:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1570 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1570/])
YARN-1090. Fixed CS UI to better reflect applications as non-schedulable and 
not as pending. Contributed by Jian He. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529538)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


> Job does not get into Pending State
> ---
>
> Key: YARN-1090
> URL: https://issues.apache.org/jira/browse/YARN-1090
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
> YARN-1090.patch
>
>
> When there is no resource available to run a job, next job should go in 
> pending state. RM UI should show next job as pending app and the counter for 
> the pending app should be incremented.
> But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
> to this job.Though Pending App count is not incremented. 
> Running 'job status ' shows job state=PREP. 
> $ mapred job -status job_1377122233385_0002
> 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
> host1/ip1
> Job: job_1377122233385_0002
> Job File: /ABC/.staging/job_1377122233385_0002/job.xml
> Job Tracking URL : http://host1:port1/application_1377122233385_0002/
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: PREP
> retired: false
> reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787593#comment-13787593
 ] 

Hudson commented on YARN-1268:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1544 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1544/])
Fix location of YARN-1268 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529531)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1268. TestFairScheduer.testContinuousScheduling is flaky (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529529)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> TestFairScheduler.testContinuousScheduling is flaky
> ---
>
> Key: YARN-1268
> URL: https://issues.apache.org/jira/browse/YARN-1268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.3.0
>
> Attachments: YARN-1268-1.patch, YARN-1268.patch
>
>
> It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787595#comment-13787595
 ] 

Hudson commented on YARN-1032:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1544 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1544/])
YARN-1032. Fixed NPE in RackResolver. Contributed by Lohit Vijayarenu. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529534)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestRackResolver.java


> NPE in RackResolve
> --
>
> Key: YARN-1032
> URL: https://issues.apache.org/jira/browse/YARN-1032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.5-alpha
> Environment: linux
>Reporter: Lohit Vijayarenu
>Assignee: Lohit Vijayarenu
>Priority: Critical
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1032.1.patch, YARN-1032.2.patch, YARN-1032.3.patch
>
>
> We found a case where our rack resolve script was not returning rack due to 
> problem with resolving host address. This exception was see in 
> RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
> {noformat}
> 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
> CONTACTING RM. 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
>   at 
> org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
>   at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787596#comment-13787596
 ] 

Hudson commented on YARN-1090:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1544 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1544/])
YARN-1090. Fixed CS UI to better reflect applications as non-schedulable and 
not as pending. Contributed by Jian He. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529538)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


> Job does not get into Pending State
> ---
>
> Key: YARN-1090
> URL: https://issues.apache.org/jira/browse/YARN-1090
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
> YARN-1090.patch
>
>
> When there is no resource available to run a job, next job should go in 
> pending state. RM UI should show next job as pending app and the counter for 
> the pending app should be incremented.
> But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
> to this job.Though Pending App count is not incremented. 
> Running 'job status ' shows job state=PREP. 
> $ mapred job -status job_1377122233385_0002
> 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
> host1/ip1
> Job: job_1377122233385_0002
> Job File: /ABC/.staging/job_1377122233385_0002/job.xml
> Job Tracking URL : http://host1:port1/application_1377122233385_0002/
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: PREP
> retired: false
> reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787598#comment-13787598
 ] 

Hudson commented on YARN-1274:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1544 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1544/])
YARN-1274. Fixed NodeManager's LinuxContainerExecutor to create user, app-dir 
and log-dirs correctly even when there are no resources to localize for the 
container. Contributed by Siddharth Seth. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> LCE fails to run containers that don't have resources to localize
> -
>
> Key: YARN-1274
> URL: https://issues.apache.org/jira/browse/YARN-1274
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Alejandro Abdelnur
>Assignee: Siddharth Seth
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
> YARN-1274.trunk.2.txt
>
>
> LCE container launch assumes the usercache/USER directory exists and it is 
> owned by the user running the container process.
> But the directory is created only if there are resources to localize by the 
> LCE localization command, if there are not resourcdes to localize, LCE 
> localization never executes and launching fails reporting 255 exit code and 
> the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
> provided 1
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> llama
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
> directory llama in 
> /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
>  - Permission denied
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1130) Improve the log flushing for tasks when mapred.userlog.limit.kb is set

2013-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787573#comment-13787573
 ] 

Hadoop QA commented on YARN-1130:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12607066/YARN-1130.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

  org.apache.hadoop.mapred.TestJobCleanup
  org.apache.hadoop.mapred.TestTaskCommit

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2131//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2131//console

This message is automatically generated.

> Improve the log flushing for tasks when mapred.userlog.limit.kb is set
> --
>
> Key: YARN-1130
> URL: https://issues.apache.org/jira/browse/YARN-1130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Paul Han
>Assignee: Paul Han
> Fix For: 2.0.5-alpha
>
> Attachments: YARN-1130.patch, YARN-1130.patch
>
>
> When userlog limit is set with something like this:
> {code}
> 
> mapred.userlog.limit.kb
> 2048
> The maximum size of user-logs of each task in KB. 0 disables the 
> cap.
> 
> 
> {code}
> the log entry will be truncated randomly for the jobs.
> The log size is left between 1.2MB to 1.6MB.
> Since the log is already limited, avoid the log truncation is crucial for 
> user.
> The other issue with the current 
> impl(org.apache.hadoop.yarn.ContainerLogAppender) is that log entries will 
> not flush to file until the container shutdown and logmanager close all 
> appenders. If user likes to see the log during task execution, it doesn't 
> support it.
> Will propose a patch to add a flush mechanism and also flush the log when 
> task is done.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1090) Job does not get into Pending State

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787564#comment-13787564
 ] 

Hudson commented on YARN-1090:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #354 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/354/])
YARN-1090. Fixed CS UI to better reflect applications as non-schedulable and 
not as pending. Contributed by Jian He. (acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529538)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/QueueMetrics.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java


> Job does not get into Pending State
> ---
>
> Key: YARN-1090
> URL: https://issues.apache.org/jira/browse/YARN-1090
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1090.1.patch, YARN-1090.2.patch, YARN-1090.3.patch, 
> YARN-1090.patch
>
>
> When there is no resource available to run a job, next job should go in 
> pending state. RM UI should show next job as pending app and the counter for 
> the pending app should be incremented.
> But Currently. Next job stays in ACCEPTED state and No AM has been assigned 
> to this job.Though Pending App count is not incremented. 
> Running 'job status ' shows job state=PREP. 
> $ mapred job -status job_1377122233385_0002
> 13/08/21 21:59:23 INFO client.RMProxy: Connecting to ResourceManager at 
> host1/ip1
> Job: job_1377122233385_0002
> Job File: /ABC/.staging/job_1377122233385_0002/job.xml
> Job Tracking URL : http://host1:port1/application_1377122233385_0002/
> Uber job : false
> Number of maps: 0
> Number of reduces: 0
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: PREP
> retired: false
> reason for failure:



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1274) LCE fails to run containers that don't have resources to localize

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787566#comment-13787566
 ] 

Hudson commented on YARN-1274:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #354 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/354/])
YARN-1274. Fixed NodeManager's LinuxContainerExecutor to create user, app-dir 
and log-dirs correctly even when there are no resources to localize for the 
container. Contributed by Siddharth Seth. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529555)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> LCE fails to run containers that don't have resources to localize
> -
>
> Key: YARN-1274
> URL: https://issues.apache.org/jira/browse/YARN-1274
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.1.1-beta
>Reporter: Alejandro Abdelnur
>Assignee: Siddharth Seth
>Priority: Blocker
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1274.1.txt, YARN-1274.trunk.1.txt, 
> YARN-1274.trunk.2.txt
>
>
> LCE container launch assumes the usercache/USER directory exists and it is 
> owned by the user running the container process.
> But the directory is created only if there are resources to localize by the 
> LCE localization command, if there are not resourcdes to localize, LCE 
> localization never executes and launching fails reporting 255 exit code and 
> the NM logs have something like:
> {code}
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command 
> provided 1
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : user is 
> llama
> 2013-10-04 14:07:56,425 INFO 
> org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Can't create 
> directory llama in 
> /yarn/nm/usercache/llama/appcache/application_1380853306301_0004/container_1380853306301_0004_01_04
>  - Permission denied
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1032) NPE in RackResolve

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787563#comment-13787563
 ] 

Hudson commented on YARN-1032:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #354 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/354/])
YARN-1032. Fixed NPE in RackResolver. Contributed by Lohit Vijayarenu. 
(acmurthy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529534)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestRackResolver.java


> NPE in RackResolve
> --
>
> Key: YARN-1032
> URL: https://issues.apache.org/jira/browse/YARN-1032
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.5-alpha
> Environment: linux
>Reporter: Lohit Vijayarenu
>Assignee: Lohit Vijayarenu
>Priority: Critical
> Fix For: 2.1.2-beta
>
> Attachments: YARN-1032.1.patch, YARN-1032.2.patch, YARN-1032.3.patch
>
>
> We found a case where our rack resolve script was not returning rack due to 
> problem with resolving host address. This exception was see in 
> RackResolver.java as NPE, ultimately caught in RMContainerAllocator. 
> {noformat}
> 2013-08-01 07:11:37,708 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
> CONTACTING RM. 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.util.RackResolver.coreResolve(RackResolver.java:99)
>   at 
> org.apache.hadoop.yarn.util.RackResolver.resolve(RackResolver.java:92)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignMapsWithLocality(RMContainerAllocator.java:1039)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assignContainers(RMContainerAllocator.java:925)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.assign(RMContainerAllocator.java:861)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator$ScheduledRequests.access$400(RMContainerAllocator.java:681)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:219)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator$1.run(RMCommunicator.java:243)
>   at java.lang.Thread.run(Thread.java:722)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (YARN-1268) TestFairScheduler.testContinuousScheduling is flaky

2013-10-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787561#comment-13787561
 ] 

Hudson commented on YARN-1268:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #354 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/354/])
Fix location of YARN-1268 in CHANGES.txt (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529531)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
YARN-1268. TestFairScheduer.testContinuousScheduling is flaky (Sandy Ryza) 
(sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1529529)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java


> TestFairScheduler.testContinuousScheduling is flaky
> ---
>
> Key: YARN-1268
> URL: https://issues.apache.org/jira/browse/YARN-1268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.3.0
>
> Attachments: YARN-1268-1.patch, YARN-1268.patch
>
>
> It looks like there's a timeout in it that's causing it to be flaky.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (YARN-1130) Improve the log flushing for tasks when mapred.userlog.limit.kb is set

2013-10-06 Thread Paul Han (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Han updated YARN-1130:
---

Attachment: YARN-1130.patch

> Improve the log flushing for tasks when mapred.userlog.limit.kb is set
> --
>
> Key: YARN-1130
> URL: https://issues.apache.org/jira/browse/YARN-1130
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Paul Han
>Assignee: Paul Han
> Fix For: 2.0.5-alpha
>
> Attachments: YARN-1130.patch, YARN-1130.patch
>
>
> When userlog limit is set with something like this:
> {code}
> 
> mapred.userlog.limit.kb
> 2048
> The maximum size of user-logs of each task in KB. 0 disables the 
> cap.
> 
> 
> {code}
> the log entry will be truncated randomly for the jobs.
> The log size is left between 1.2MB to 1.6MB.
> Since the log is already limited, avoid the log truncation is crucial for 
> user.
> The other issue with the current 
> impl(org.apache.hadoop.yarn.ContainerLogAppender) is that log entries will 
> not flush to file until the container shutdown and logmanager close all 
> appenders. If user likes to see the log during task execution, it doesn't 
> support it.
> Will propose a patch to add a flush mechanism and also flush the log when 
> task is done.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)