date:20130702


 [ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-353:
-

Attachment: YARN-353.3.patch

Devaraj, thanks for your review
new patch, fixed the findbug and comments.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-763) AMRMClientAsync should stop heartbeating after receiving shutdown from RM


[ 
https://issues.apache.org/jira/browse/YARN-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698071#comment-13698071
 ] 

Hadoop QA commented on YARN-763:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590473/YARN-763.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.api.impl.TestNMClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1415//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1415//console

This message is automatically generated.

 AMRMClientAsync should stop heartbeating after receiving shutdown from RM
 -

 Key: YARN-763
 URL: https://issues.apache.org/jira/browse/YARN-763
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bikas Saha
Assignee: Xuan Gong
 Attachments: YARN-763.1.patch, YARN-763.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-353) Add Zookeeper-based store implementation for RMStateStore


[ 
https://issues.apache.org/jira/browse/YARN-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698099#comment-13698099
 ] 

Hadoop QA commented on YARN-353:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590474/YARN-353.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1416//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1416//console

This message is automatically generated.

 Add Zookeeper-based store implementation for RMStateStore
 -

 Key: YARN-353
 URL: https://issues.apache.org/jira/browse/YARN-353
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Hitesh Shah
Assignee: Bikas Saha
 Attachments: YARN-353.1.patch, YARN-353.2.patch, YARN-353.3.patch


 Add store that write RM state data to ZK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows

2013-07-02 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth reassigned YARN-894:
--

Assignee: Chris Nauroth (was: Chuan Liu)

NodeHealthScriptRunner timeout checking is inaccurate on Windows

Key: YARN-894
URL: https://issues.apache.org/jira/browse/YARN-894
Project: Hadoop YARN
Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chuan Liu
Assignee: Chris Nauroth
Priority: Minor
Attachments: ReadProcessStdout.java, wait.cmd, wait.sh,
YARN-894-trunk.patch

In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based
on the Shell execution results. Some status are based on the exception thrown
during the Shell script execution.
Currently, we will catch a non-ExitCodeException from ShellCommandExecutor,
and if Shell has the timeout status set at the same time, we will also set
HealthChecker status to timeout.
We have following execution sequence in Shell:
1) In main thread, schedule a delayed timer task that will kill the original
process upon timeout.
2) In main thread, open a buffered reader and feed in the process's standard
input stream.
3) When timeout happens, the timer task will call {{Process#destroy()}}
to kill the main process.
On Linux, when timeout happened and process killed, the buffered reader will
thrown an IOException with message: Stream closed in main thread.
On Windows, we don't have the IOException. Only -1 was returned from the
reader that indicates the buffer is finished. As a result, the timeout status
is not set on Windows, and {{TestNodeHealthService}} fails on Windows because
of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-710) Add to ser/deser methods to RecordFactory

2013-07-02 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698232#comment-13698232
 ] 

Siddharth Seth commented on YARN-710:
-

In the unit test, the setters on the ApplicationId aren't meant to be used 
(will end up throwing exceptions - this is replaced by newInstance in 
AppliactionId). Don't think getProto() needs to be changed at all in 
RecordFactoryPBImpl - instead a new getBuilder method should be sufficient. 
Somewhere along the flow, it looks like the default proto ends up being created 
- possibly linked to the getProto changes.

 Add to ser/deser methods to RecordFactory
 -

 Key: YARN-710
 URL: https://issues.apache.org/jira/browse/YARN-710
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: api
Affects Versions: 2.0.4-alpha
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: YARN-710.patch, YARN-710.patch, YARN-710-wip.patch


 I order to do things like AMs failover and checkpointing I need to serialize 
 app IDs, app attempt IDs, containers and/or IDs,  resource requests, etc.
 Because we are wrapping/hiding the PB implementation from the APIs, we are 
 hiding the built in PB ser/deser capabilities.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable


 [ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-814:
-

Attachment: YARN-814.4.patch

new patch ,account for both stdout and stderr.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable


[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698250#comment-13698250
 ] 

Jian He commented on YARN-814:
--

ran on single node, see the log messages

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-871) Failed to run MR example against latest trunk

2013-07-02 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698267#comment-13698267
 ] 

Junping Du commented on YARN-871:
-

Hi [~zjshen], given YARN-874 is committed, shall we resolve it?

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-814) Difficult to diagnose a failed container launch when error due to invalid environment variable


[ 
https://issues.apache.org/jira/browse/YARN-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698280#comment-13698280
 ] 

Hadoop QA commented on YARN-814:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12590515/YARN-814.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1417//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1417//console

This message is automatically generated.

 Difficult to diagnose a failed container launch when error due to invalid 
 environment variable
 --

 Key: YARN-814
 URL: https://issues.apache.org/jira/browse/YARN-814
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Hitesh Shah
Assignee: Jian He
 Attachments: YARN-814.1.patch, YARN-814.2.patch, YARN-814.3.patch, 
 YARN-814.4.patch, YARN-814.patch


 The container's launch script sets up environment variables, symlinks etc. 
 If there is any failure when setting up the basic context ( before the actual 
 user's process is launched ), nothing is captured by the NM. This makes it 
 impossible to diagnose the reason for the failure. 
 To reproduce, set an env var where the value contains characters that throw 
 syntax errors in bash. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (YARN-871) Failed to run MR example against latest trunk

2013-07-02 Thread Zhijie Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-871.
--

Resolution: Cannot Reproduce

Thanks, [~djp]! Close it as cannot reproduce

 Failed to run MR example against latest trunk
 -

 Key: YARN-871
 URL: https://issues.apache.org/jira/browse/YARN-871
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
 Attachments: yarn-zshen-resourcemanager-ZShens-MacBook-Pro.local.log


 Built the latest trunk, deployed a single node cluster and ran examples, such 
 as
 {code}
  hadoop jar 
 hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
  teragen 10 out1
 {code}
 The job failed with the following console message:
 {code}
 13/06/21 12:51:25 INFO mapreduce.Job: Running job: job_1371844267731_0001
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 running in 
 uber mode : false
 13/06/21 12:51:31 INFO mapreduce.Job:  map 0% reduce 0%
 13/06/21 12:51:31 INFO mapreduce.Job: Job job_1371844267731_0001 failed with 
 state FAILED due to: Application application_1371844267731_0001 failed 2 
 times due to AM Container for appattempt_1371844267731_0001_02 exited 
 with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 13/06/21 12:51:31 INFO mapreduce.Job: Counters: 0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE


[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698326#comment-13698326
 ] 

Mayank Bansal commented on YARN-845:


I had an offline discussion with [~arpitgupta] and [~bikassaha] 

We are not able to reproduce the issue however we can synchronize the 
application object on assignreserved containers to make it consistent with 
another calls.
I am adding more logs to find the issue if we can get this crash. 

I am also sending yean run time exceptions if we get this null again.

Thanks,
Mayank

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@hostXX:8088
 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
 (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
 recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
 interrupted
 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager

[jira] [Updated] (YARN-845) RM crash with NPE on NODE_UPDATE


 [ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-845:
---

Attachment: YARN-845-trunk-1.patch

Attaching updated patch and rebasing it.

Thanks,
Mayank

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:665)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:727)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:83)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
 at java.lang.Thread.run(Thread.java:662)
 2013-06-17 12:43:53,659 INFO  resourcemanager.ResourceManager 
 (ResourceManager.java:run(426)) - Exiting, bbye..
 2013-06-17 12:43:53,665 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
 SelectChannelConnector@hostXX:8088
 2013-06-17 12:43:53,765 ERROR delegation.AbstractDelegationTokenSecretManager 
 (AbstractDelegationTokenSecretManager.java:run(513)) - InterruptedExcpetion 
 recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
 interrupted
 2013-06-17 12:43:53,766 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
 system...
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
 2013-06-17 12:43:53,767 INFO  impl.MetricsSystemImpl 
 (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
 shutdown complete.
 2013-06-17

[jira] [Commented] (YARN-245) Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at FINISHED


[ 
https://issues.apache.org/jira/browse/YARN-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698337#comment-13698337
 ] 

Mayank Bansal commented on YARN-245:


I just tried this patch and it does not need rebasing.

Thanks,
Mayank

 Node Manager gives InvalidStateTransitonException for FINISH_APPLICATION at 
 FINISHED
 

 Key: YARN-245
 URL: https://issues.apache.org/jira/browse/YARN-245
 Project: Hadoop YARN
  Issue Type: Sub-task
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-245-trunk-1.patch


 {code:xml}
 2012-11-25 12:56:11,795 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 FINISH_APPLICATION at FINISHED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:398)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:520)
 at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:512)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
 at java.lang.Thread.run(Thread.java:662)
 2012-11-25 12:56:11,796 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1353818859056_0004 transitioned from FINISHED to null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-295) Resource Manager throws InvalidStateTransitonException: Invalid event: CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl


[ 
https://issues.apache.org/jira/browse/YARN-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698346#comment-13698346
 ] 

Mayank Bansal commented on YARN-295:


Latest patch does not need any rebasing

Thanks,
Mayank

 Resource Manager throws InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED for RMAppAttemptImpl
 ---

 Key: YARN-295
 URL: https://issues.apache.org/jira/browse/YARN-295
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.0.2-alpha, 2.0.1-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-295-trunk-1.patch, YARN-295-trunk-2.patch


 {code:xml}
 2012-12-28 14:03:56,956 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
 Can't handle this event at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 CONTAINER_FINISHED at ALLOCATED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:490)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:80)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:433)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:414)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-845) RM crash with NPE on NODE_UPDATE


[ 
https://issues.apache.org/jira/browse/YARN-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698357#comment-13698357
 ] 

Hadoop QA commented on YARN-845:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12590532/YARN-845-trunk-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1418//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1418//console

This message is automatically generated.

 RM crash with NPE on NODE_UPDATE
 

 Key: YARN-845
 URL: https://issues.apache.org/jira/browse/YARN-845
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Arpit Gupta
Assignee: Mayank Bansal
 Attachments: rm.log, YARN-845-trunk-1.patch, 
 YARN-845-trunk-draft.patch


 the following stack trace is generated in rm
 {code}
 n, service: 68.142.246.147:45454 }, ] resource=memory:1536, vCores:1 
 queue=default: capacity=1.0, absoluteCapacity=1.0, 
 usedResources=memory:44544, vCores:29usedCapacity=0.90625, 
 absoluteUsedCapacity=0.90625, numApps=1, numContainers=29 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,655 INFO  capacity.ParentQueue 
 (ParentQueue.java:completedContainer(696)) - completedContainer queue=root 
 usedCapacity=0.90625 absoluteUsedCapacity=0.90625 used=memory:44544, 
 vCores:29 cluster=memory:49152, vCores:48
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:completedContainer(832)) - Application 
 appattempt_1371448527090_0844_01 released container 
 container_1371448527090_0844_01_05 on node: host: hostXX:45454 
 #containers=4 available=2048 used=6144 with event: FINISHED
 2013-06-17 12:43:53,656 INFO  capacity.CapacityScheduler 
 (CapacityScheduler.java:nodeUpdate(661)) - Trying to fulfill reservation for 
 application application_1371448527090_0844 on node: hostXX:45454
 2013-06-17 12:43:53,656 INFO  fica.FiCaSchedulerApp 
 (FiCaSchedulerApp.java:unreserve(435)) - Application 
 application_1371448527090_0844 unreserved  on node host: hostXX:45454 
 #containers=4 available=2048 used=6144, currently has 4 at priority 20; 
 currentReservation memory:6144, vCores:4
 2013-06-17 12:43:53,656 INFO  scheduler.AppSchedulingInfo 
 (AppSchedulingInfo.java:updateResourceRequests(168)) - checking for 
 deactivate...
 2013-06-17 12:43:53,657 FATAL resourcemanager.ResourceManager 
 (ResourceManager.java:run(422)) - Error in handling event type NODE_UPDATE to 
 the scheduler
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:432)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.unreserve(LeafQueue.java:1416)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1346)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1221)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignReservedContainer(LeafQueue.java:939)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:803)

[jira] [Created] (YARN-895) If NameNode is in safemode when RM restarts, RM should wait instead of crashing.

Jian He created YARN-895:


 Summary: If NameNode is in safemode when RM restarts, RM should 
wait instead of crashing.
 Key: YARN-895
 URL: https://issues.apache.org/jira/browse/YARN-895
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jian He
Assignee: Jian He




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-299) Node Manager throws org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: RESOURCE_FAILED at DONE


[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698414#comment-13698414
 ] 

Mayank Bansal commented on YARN-299:


This patch does not need rebasing

Thanks,
Mayank

 Node Manager throws 
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
 ---

 Key: YARN-299
 URL: https://issues.apache.org/jira/browse/YARN-299
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: 2.0.1-alpha, 2.0.0-alpha
Reporter: Devaraj K
Assignee: Mayank Bansal
 Attachments: YARN-299-trunk-1.patch


 {code:xml}
 2012-12-31 10:36:27,844 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Can't handle this event at current state: Current: [DONE], eventType: 
 [RESOURCE_FAILED]
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 RESOURCE_FAILED at DONE
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
   at 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
   at java.lang.Thread.run(Thread.java:662)
 2012-12-31 10:36:27,845 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1356792558130_0002_01_01 transitioned from DONE to 
 null
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-649) Make container logs available over HTTP in plain text

2013-07-02 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13698415#comment-13698415
]

Zhijie Shen commented on YARN-649:
--

Read the patch quickly. It looks almost fine to me. One minor question: why
does getLogs not support XML?

{code}
+ @GET
+ @Path(/containerlogs/{containerid}/{filename})
+ @Produces({ MediaType.TEXT_PLAIN, MediaType.APPLICATION_JSON })
+ @Evolving
+ public Response getLogs(@PathParam(containerid) String containerIdStr,
+ @PathParam(filename) String filename) {
{code}

Here's some additional thoughts. For the long running applications, they may
have a big log file, such that it will take a long time to download the log
file via the RESTful API. Consequently, HTTP connection may timeout before
downloading before downloading a complete log file. Maybe it is good to zip the
log file before sending it, and unzip it after receiving it. Moreover, it can
be more advanced to query the part of log which is recorded during timestamp1
and timestamp2. Just think out loudly. Not sure it is required right now.

Make container logs available over HTTP in plain text
-

Key: YARN-649
URL: https://issues.apache.org/jira/browse/YARN-649
Project: Hadoop YARN
Issue Type: Improvement
Components: nodemanager
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Attachments: YARN-649-2.patch, YARN-649-3.patch, YARN-649-4.patch,
YARN-649.patch, YARN-752-1.patch

It would be good to make container logs available over the REST API for
MAPREDUCE-4362 and so that they can be accessed programatically in general.

[jira] [Commented] (YARN-502) RM crash with NPE on NODE_REMOVED event with FairScheduler