date:20131118


[ 
https://issues.apache.org/jira/browse/YARN-713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825677#comment-13825677
 ] 

Hadoop QA commented on YARN-713:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614185/YARN-713.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2474//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2474//console

This message is automatically generated.

 ResourceManager can exit unexpectedly if DNS is unavailable
 ---

 Key: YARN-713
 URL: https://issues.apache.org/jira/browse/YARN-713
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.1.0-beta
Reporter: Jason Lowe
Assignee: Omkar Vinit Joshi
Priority: Critical
 Fix For: 2.3.0

 Attachments: YARN-713.09052013.1.patch, YARN-713.09062013.1.patch, 
 YARN-713.1.patch, YARN-713.2.patch, YARN-713.20130910.1.patch, 
 YARN-713.patch, YARN-713.patch, YARN-713.patch, YARN-713.patch


 As discussed in MAPREDUCE-5261, there's a possibility that a DNS outage could 
 lead to an unhandled exception in the ResourceManager's AsyncDispatcher, and 
 that ultimately would cause the RM to exit.  The RM should not exit during 
 DNS hiccups.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1421) Node managers will not receive application finish event where containers ran before RM restart

Omkar Vinit Joshi created YARN-1421:
---

 Summary: Node managers will not receive application finish event 
where containers ran before RM restart
 Key: YARN-1421
 URL: https://issues.apache.org/jira/browse/YARN-1421
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Critical


Problem :- Today for every application we track the node managers where 
container ran. So when application finishes it notifies all those node managers 
about application finish event (via node manager heartbeat). However if rm 
restarts then we forget this past information and those node managers will 
never get application finish event and will keep reporting finished 
applications.

Propose Solution :- Instead of remembering the node managers where containers 
ran for this particular application it would be better if we depend on node 
manager heartbeat to take this decision. i.e. when node manager heartbeats 
saying it is running application (app1, app2) then we should those 
application's status in RM's memory {code}rmContext.getRMApps(){code} and if 
either they are not found (very old applications) or they are in their final 
state (FINISHED, KILLED, FAILED) then we should immediately notify the node 
manager about the application finish event.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-584) In fair scheduler web UI, queues unexpand on refresh

2013-11-18 Thread Harshit Daga (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harshit Daga updated YARN-584:
--

Attachment: YARN-584-branch-2.2.0.patch

Updated patch with
- indentation and spacing after brackets using already present class as 
reference. 
- renaming of methods / inner class name

 In fair scheduler web UI, queues unexpand on refresh
 

 Key: YARN-584
 URL: https://issues.apache.org/jira/browse/YARN-584
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
  Labels: newbie
 Attachments: YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch


 In the fair scheduler web UI, you can expand queue information.  Refreshing 
 the page causes the expansions to go away, which is annoying for someone who 
 wants to monitor the scheduler page and needs to reopen all the queues they 
 care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1421) Node managers will not receive application finish event where containers ran before RM restart

[
https://issues.apache.org/jira/browse/YARN-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Omkar Vinit Joshi updated YARN-1421:

Description:
Problem :- Today for every application we track the node managers where
containers ran. So when application finishes it notifies all those node
managers about application finish event (via node manager heartbeat). However
if rm restarts then we forget this past information and those node managers
will never get application finish event and will keep reporting finished
applications.

Proposed Solution :- Instead of remembering the node managers where containers
ran for this particular application it would be better if we depend on node
manager heartbeat to take this decision. i.e. when node manager heartbeats
saying it is running application (app1, app2) then we should check those
application's status in RM's memory {code}rmContext.getRMApps(){code} and if
either they are not found (very old applications) or they are in their final
state (FINISHED, KILLED, FAILED) then we should immediately notify the node
manager about the application finish event. By doing this we are reducing the
state which we need to store at RM after restart.

was:
Problem :- Today for every application we track the node managers where
container ran. So when application finishes it notifies all those node managers
about application finish event (via node manager heartbeat). However if rm
restarts then we forget this past information and those node managers will
never get application finish event and will keep reporting finished
applications.

Propose Solution :- Instead of remembering the node managers where containers
ran for this particular application it would be better if we depend on node
manager heartbeat to take this decision. i.e. when node manager heartbeats
saying it is running application (app1, app2) then we should those
application's status in RM's memory {code}rmContext.getRMApps(){code} and if
either they are not found (very old applications) or they are in their final
state (FINISHED, KILLED, FAILED) then we should immediately notify the node
manager about the application finish event.

Node managers will not receive application finish event where containers ran
before RM restart
--

Key: YARN-1421
URL: https://issues.apache.org/jira/browse/YARN-1421
Project: Hadoop YARN
Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi
Priority: Critical

Problem :- Today for every application we track the node managers where
containers ran. So when application finishes it notifies all those node
managers about application finish event (via node manager heartbeat). However
if rm restarts then we forget this past information and those node managers
will never get application finish event and will keep reporting finished
applications.
Proposed Solution :- Instead of remembering the node managers where
containers ran for this particular application it would be better if we
depend on node manager heartbeat to take this decision. i.e. when node
manager heartbeats saying it is running application (app1, app2) then we
should check those application's status in RM's memory
{code}rmContext.getRMApps(){code} and if either they are not found (very old
applications) or they are in their final state (FINISHED, KILLED, FAILED)
then we should immediately notify the node manager about the application
finish event. By doing this we are reducing the state which we need to store
at RM after restart.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1312) Job History server queue attribute incorrectly reports default when username is actually used for queue at runtime

2013-11-18 Thread Harshit Daga (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825746#comment-13825746
 ] 

Harshit Daga commented on YARN-1312:


Hi Philip will like to fix this issue.
Have tried and replicated the same and getting queue name as default (as you 
mentioned).
Can you provide me with some starting point (in code) to fix the issue.

 Job History server queue attribute incorrectly reports default when 
 username is actually used for queue at runtime
 

 Key: YARN-1312
 URL: https://issues.apache.org/jira/browse/YARN-1312
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Philip Zeyliger

 If you run a MapReduce job with the fair scheduler and you query the 
 JobHistory server for its metadata, you might see something like the 
 following at 
 http://jh_host:19888/ws/v1/history/mapreduce/jobs/job_1381878638171_0001/
 {code}
 job
 startTime1381890132608/startTime
 finishTime1381890141988/finishTime
 idjob_1381878638171_0001/id
 nameTeraGen/name
 queuedefault/queue
 userhdfs/user
 ...
 /job
 {code}
 The same is true if you query the RM while it's running via 
 http://rm_host:8088/ws/v1/cluster/apps/application_1381878638171_0002:
 {code}
 app
 idapplication_1381878638171_0002/id
 userhdfs/user
 nameTeraGen/name
 queuedefault/queue
 ...
 /app
 {code}
 As it turns out, in both of these cases, the job is actually executing in 
 root.hdfs and not in root.default because 
 {{yarn.scheduler.fair.user-as-default-queue}} is set to true.
 This makes it hard to figure out after the fact (or during!) what queue the 
 MR job was running under.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.8.patch

Thanks [~vinodkv] for pointing it out..didn't understand earlier. Adding 
synchronized block to service state change.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh


[ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825776#comment-13825776
 ] 

Hadoop QA commented on YARN-584:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12614477/YARN-584-branch-2.2.0.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2475//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2475//console

This message is automatically generated.

 In fair scheduler web UI, queues unexpand on refresh
 

 Key: YARN-584
 URL: https://issues.apache.org/jira/browse/YARN-584
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
  Labels: newbie
 Attachments: YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch


 In the fair scheduler web UI, you can expand queue information.  Refreshing 
 the page causes the expansions to go away, which is annoying for someone who 
 wants to monitor the scheduler page and needs to reopen all the queues they 
 care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825780#comment-13825780
 ] 

Hadoop QA commented on YARN-674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614481/YARN-674.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2476//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2476//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2476//console

This message is automatically generated.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-584) In fair scheduler web UI, queues unexpand on refresh


[ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825804#comment-13825804
 ] 

Sandy Ryza commented on YARN-584:
-

+1

 In fair scheduler web UI, queues unexpand on refresh
 

 Key: YARN-584
 URL: https://issues.apache.org/jira/browse/YARN-584
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
  Labels: newbie
 Attachments: YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch


 In the fair scheduler web UI, you can expand queue information.  Refreshing 
 the page causes the expansions to go away, which is annoying for someone who 
 wants to monitor the scheduler page and needs to reopen all the queues they 
 care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-584) In fair scheduler web UI, queues unexpand on refresh


 [ 
https://issues.apache.org/jira/browse/YARN-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-584:


Assignee: Harshit Daga

 In fair scheduler web UI, queues unexpand on refresh
 

 Key: YARN-584
 URL: https://issues.apache.org/jira/browse/YARN-584
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza
Assignee: Harshit Daga
  Labels: newbie
 Attachments: YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch, 
 YARN-584-branch-2.2.0.patch, YARN-584-branch-2.2.0.patch


 In the fair scheduler web UI, you can expand queue information.  Refreshing 
 the page causes the expansions to go away, which is annoying for someone who 
 wants to monitor the scheduler page and needs to reopen all the queues they 
 care about each time.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real


 [ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-1210:


Attachment: YARN-1210.7.patch

 During RM restart, RM should start a new attempt only when previous attempt 
 exits for real
 --

 Key: YARN-1210
 URL: https://issues.apache.org/jira/browse/YARN-1210
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
 YARN-1210.4.patch, YARN-1210.4.patch, YARN-1210.5.patch, YARN-1210.6.patch, 
 YARN-1210.7.patch


 When RM recovers, it can wait for existing AMs to contact RM back and then 
 kill them forcefully before even starting a new AM. Worst case, RM will start 
 a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
 we'll minimize multiple AMs racing with each other. This can help issues with 
 downstream components like Pig, Hive and Oozie during RM restart.
 In the mean while, new apps will proceed as usual as existing apps wait for 
 recovery.
 This can continue to be useful after work-preserving restart, so that AMs 
 which can properly sync back up with RM can continue to run and those that 
 don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Created] (YARN-1422) RM CapacityScheduler can deadlock when getQueueUserAclInfo() is called and a container is completing

2013-11-18 Thread Adam Kawa (JIRA)

Adam Kawa created YARN-1422:
---

 Summary: RM CapacityScheduler can deadlock when 
getQueueUserAclInfo() is called and a container is completing
 Key: YARN-1422
 URL: https://issues.apache.org/jira/browse/YARN-1422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.2.0
Reporter: Adam Kawa


If getQueueUserAclInfo() on a parent/root queue (e.g. via 
CapacityScheduler.getQueueUserAclInfo) is called, and a container is 
completing, then the ResourceManager can deadlock. 

It is similar to https://issues.apache.org/jira/browse/YARN-325. 

*More details:*

* Thread A
1) In a synchronized block of code (a lockid 
0xc18d8870=LeafQueue.class), LeafQueue.completedContainer wants to 
inform the parent queue that a container is being completed and invokes 
ParentQueue.completedContainer method.
3) The ParentQueue.completedContainer waits to aquire a lock on itself (a 
lockid 0xc1846350=ParentQueue.class) to go to synchronized block of 
code. It can not accuire this lock, because Thread B already has this lock.

* Thread B
0) A moment earlier, CapacityScheduler.getQueueUserAclInfo is called. This 
method invokes a synchronized method on ParentQueue.class i.e. 
ParentQueue.getQueueUserAclInfo (a lockid 0xc1846350=ParentQueue.class) 
and aquires the lock that Thread A will be waiting for. 
2) Unluckyly, ParentQueue.getQueueUserAclInfo iterates over children queue acls 
and it wants to run a synchonized method, LeafQueue.getQueueUserAclInfo, but it 
does not have a lock on LeafQueue.class (a lockid 
0xc18d8870=LeafQueue.class). This lock is already held by 
LeafQueue.completedContainer in Thread A.

The order that causes the deadlock: B0 - A1 - B2 - A3.

*Java Stacktrace*

{code}
Found one Java-level deadlock:
=
1956747953@qtp-109760451-1959:
  waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
  which is held by IPC Server handler 39 on 8032
IPC Server handler 39 on 8032:
  waiting to lock monitor 0x422bbc58 (object 0xc18d8870, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
  which is held by ResourceManager Event Processor
ResourceManager Event Processor:
  waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
  which is held by IPC Server handler 39 on 8032

Java stack information for the threads listed above:
===
1956747953@qtp-109760451-1959:
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getUsedCapacity(ParentQueue.java:276)
- waiting to lock 0xc1846350 (a 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.init(CapacitySchedulerInfo.java:49)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:203)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:76)
at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at

[jira] [Updated] (YARN-1422) RM CapacityScheduler can deadlock when getQueueUserAclInfo() is called and a container is completing

2013-11-18 Thread Adam Kawa (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Kawa updated YARN-1422:


Priority: Critical  (was: Major)

 RM CapacityScheduler can deadlock when getQueueUserAclInfo() is called and a 
 container is completing
 

 Key: YARN-1422
 URL: https://issues.apache.org/jira/browse/YARN-1422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.2.0
Reporter: Adam Kawa
Priority: Critical

 If getQueueUserAclInfo() on a parent/root queue (e.g. via 
 CapacityScheduler.getQueueUserAclInfo) is called, and a container is 
 completing, then the ResourceManager can deadlock. 
 It is similar to https://issues.apache.org/jira/browse/YARN-325. 
 *More details:*
 * Thread A
 1) In a synchronized block of code (a lockid 
 0xc18d8870=LeafQueue.class), LeafQueue.completedContainer wants to 
 inform the parent queue that a container is being completed and invokes 
 ParentQueue.completedContainer method.
 3) The ParentQueue.completedContainer waits to aquire a lock on itself (a 
 lockid 0xc1846350=ParentQueue.class) to go to synchronized block of 
 code. It can not accuire this lock, because Thread B already has this lock.
 * Thread B
 0) A moment earlier, CapacityScheduler.getQueueUserAclInfo is called. This 
 method invokes a synchronized method on ParentQueue.class i.e. 
 ParentQueue.getQueueUserAclInfo (a lockid 
 0xc1846350=ParentQueue.class) and aquires the lock that Thread A will 
 be waiting for. 
 2) Unluckyly, ParentQueue.getQueueUserAclInfo iterates over children queue 
 acls and it wants to run a synchonized method, LeafQueue.getQueueUserAclInfo, 
 but it does not have a lock on LeafQueue.class (a lockid 
 0xc18d8870=LeafQueue.class). This lock is already held by 
 LeafQueue.completedContainer in Thread A.
 The order that causes the deadlock: B0 - A1 - B2 - A3.
 *Java Stacktrace*
 {code}
 Found one Java-level deadlock:
 =
 1956747953@qtp-109760451-1959:
   waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
   which is held by IPC Server handler 39 on 8032
 IPC Server handler 39 on 8032:
   waiting to lock monitor 0x422bbc58 (object 0xc18d8870, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
   which is held by ResourceManager Event Processor
 ResourceManager Event Processor:
   waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
   which is held by IPC Server handler 39 on 8032
 Java stack information for the threads listed above:
 ===
 1956747953@qtp-109760451-1959:
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getUsedCapacity(ParentQueue.java:276)
   - waiting to lock 0xc1846350 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.init(CapacitySchedulerInfo.java:49)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:203)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:76)
   at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at

[jira] [Commented] (YARN-1210) During RM restart, RM should start a new attempt only when previous attempt exits for real


[ 
https://issues.apache.org/jira/browse/YARN-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825965#comment-13825965
 ] 

Hadoop QA commented on YARN-1210:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614492/YARN-1210.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2477//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2477//console

This message is automatically generated.

 During RM restart, RM should start a new attempt only when previous attempt 
 exits for real
 --

 Key: YARN-1210
 URL: https://issues.apache.org/jira/browse/YARN-1210
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-1210.1.patch, YARN-1210.2.patch, YARN-1210.3.patch, 
 YARN-1210.4.patch, YARN-1210.4.patch, YARN-1210.5.patch, YARN-1210.6.patch, 
 YARN-1210.7.patch


 When RM recovers, it can wait for existing AMs to contact RM back and then 
 kill them forcefully before even starting a new AM. Worst case, RM will start 
 a new AppAttempt after waiting for 10 mins ( the expiry interval). This way 
 we'll minimize multiple AMs racing with each other. This can help issues with 
 downstream components like Pig, Hive and Oozie during RM restart.
 In the mean while, new apps will proceed as usual as existing apps wait for 
 recovery.
 This can continue to be useful after work-preserving restart, so that AMs 
 which can properly sync back up with RM can continue to run and those that 
 don't are guaranteed to be killed before starting a new attempt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi updated YARN-674:
---

Attachment: YARN-674.9.patch

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch, YARN-674.9.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825976#comment-13825976
 ] 

Mayank Bansal commented on YARN-1266:
-

[~zjshen] thanks for review

bq. IMHO, application_base_protocol.proto should not be necessary, because the 
base interface is to extract the common code, not to be directly used from the 
RPC interface. 
We need it as service impl needs it.

bq. 2. ApplicationClientProtocolPB and ApplicationHistoryProtocolPB don't need 
to extend ApplicationBaseProtocolService.BlockingInterface
Done.

Thanks,
Mayank

 inheriting Application client and History Protocol from base protocol and 
 implement PB service and clients.
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.


 [ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1266:


Attachment: YARN-1266-4.patch

Attaching latest patch.

Thanks,
Mayank

 inheriting Application client and History Protocol from base protocol and 
 implement PB service and clients.
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1422) RM CapacityScheduler can deadlock when getQueueUserAclInfo() is called and a container is completing


[ 
https://issues.apache.org/jira/browse/YARN-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825988#comment-13825988
 ] 

Omkar Vinit Joshi commented on YARN-1422:
-

Yes this looks to be a problem.
check this [synchronization locking problem | 
https://issues.apache.org/jira/browse/YARN-897?focusedCommentId=13706284page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13706284]
 The ordering always should be from root to leaf queue. I think there can be 
other places too where this ordering is mixed. 

 RM CapacityScheduler can deadlock when getQueueUserAclInfo() is called and a 
 container is completing
 

 Key: YARN-1422
 URL: https://issues.apache.org/jira/browse/YARN-1422
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.2.0
Reporter: Adam Kawa
Priority: Critical

 If getQueueUserAclInfo() on a parent/root queue (e.g. via 
 CapacityScheduler.getQueueUserAclInfo) is called, and a container is 
 completing, then the ResourceManager can deadlock. 
 It is similar to https://issues.apache.org/jira/browse/YARN-325. 
 *More details:*
 * Thread A
 1) In a synchronized block of code (a lockid 
 0xc18d8870=LeafQueue.class), LeafQueue.completedContainer wants to 
 inform the parent queue that a container is being completed and invokes 
 ParentQueue.completedContainer method.
 3) The ParentQueue.completedContainer waits to aquire a lock on itself (a 
 lockid 0xc1846350=ParentQueue.class) to go to synchronized block of 
 code. It can not accuire this lock, because Thread B already has this lock.
 * Thread B
 0) A moment earlier, CapacityScheduler.getQueueUserAclInfo is called. This 
 method invokes a synchronized method on ParentQueue.class i.e. 
 ParentQueue.getQueueUserAclInfo (a lockid 
 0xc1846350=ParentQueue.class) and aquires the lock that Thread A will 
 be waiting for. 
 2) Unluckyly, ParentQueue.getQueueUserAclInfo iterates over children queue 
 acls and it wants to run a synchonized method, LeafQueue.getQueueUserAclInfo, 
 but it does not have a lock on LeafQueue.class (a lockid 
 0xc18d8870=LeafQueue.class). This lock is already held by 
 LeafQueue.completedContainer in Thread A.
 The order that causes the deadlock: B0 - A1 - B2 - A3.
 *Java Stacktrace*
 {code}
 Found one Java-level deadlock:
 =
 1956747953@qtp-109760451-1959:
   waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
   which is held by IPC Server handler 39 on 8032
 IPC Server handler 39 on 8032:
   waiting to lock monitor 0x422bbc58 (object 0xc18d8870, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue),
   which is held by ResourceManager Event Processor
 ResourceManager Event Processor:
   waiting to lock monitor 0x434e10c8 (object 0xc1846350, a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue),
   which is held by IPC Server handler 39 on 8032
 Java stack information for the threads listed above:
 ===
 1956747953@qtp-109760451-1959:
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.getUsedCapacity(ParentQueue.java:276)
   - waiting to lock 0xc1846350 (a 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.init(CapacitySchedulerInfo.java:49)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:203)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:76)
   at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
   at

[jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.

2013-11-18 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13825989#comment-13825989
 ] 

Bikas Saha commented on YARN-744:
-

Better name?
{code}
+AllocateResponseLock res = responseMap.get(applicationAttemptId);
{code}

reuse throwApplicationAttemptDoesNotExistInCacheException() in 
registerApplicationMaster()?

use InvalidApplicationMasterRequestException or a new specific exception 
instead of generic RPCUtil.throwRemoteException()?
{code}
+  private void throwApplicationAttemptDoesNotExistInCacheException(
+  ApplicationAttemptId appAttemptId) throws YarnException {
+String message = Application doesn't exist in cache 
++ appAttemptId;
+LOG.error(message);
+throw RPCUtil.getRemoteException(message);
+  }
{code}

The new logic is not the same as the old one. If the app is no longer in the 
cache then it would send a resync response. Now it will send a regular response 
instead of a resync response.
{code}
-  // before returning response, verify in sync
-  AllocateResponse oldResponse =
-  responseMap.put(appAttemptId, allocateResponse);
-  if (oldResponse == null) {
-// appAttempt got unregistered, remove it back out
-responseMap.remove(appAttemptId);
-String message = App Attempt removed from the cache during allocate
-+ appAttemptId;
-LOG.error(message);
-return resync;
-  }
-
+  res.setAllocateResponse(allocateResponse);
   return allocateResponse;
{code}

 Race condition in ApplicationMasterService.allocate .. It might process same 
 allocate request twice resulting in additional containers getting allocated.
 -

 Key: YARN-744
 URL: https://issues.apache.org/jira/browse/YARN-744
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Omkar Vinit Joshi
Priority: Minor
 Attachments: MAPREDUCE-3899-branch-0.23.patch, 
 YARN-744-20130711.1.patch, YARN-744-20130715.1.patch, 
 YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch


 Looks like the lock taken in this is broken. It takes a lock on lastResponse 
 object and then puts a new lastResponse object into the map. At this point a 
 new thread entering this function will get a new lastResponse object and will 
 be able to take its lock and enter the critical section. Presumably we want 
 to limit one response per app attempt. So the lock could be taken on the 
 ApplicationAttemptId key of the response map object.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1266) inheriting Application client and History Protocol from base protocol and implement PB service and clients.


[ 
https://issues.apache.org/jira/browse/YARN-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826005#comment-13826005
 ] 

Hadoop QA commented on YARN-1266:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614515/YARN-1266-4.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2478//console

This message is automatically generated.

 inheriting Application client and History Protocol from base protocol and 
 implement PB service and clients.
 ---

 Key: YARN-1266
 URL: https://issues.apache.org/jira/browse/YARN-1266
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal
 Attachments: YARN-1266-1.patch, YARN-1266-2.patch, YARN-1266-3.patch, 
 YARN-1266-4.patch


 Adding ApplicationHistoryProtocolPBService to make web apps to work and 
 changing yarn to run AHS as a seprate process



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler


 [ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-1403:
-

Attachment: YARN-1403-2.patch

 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol


[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826014#comment-13826014
 ] 

Mayank Bansal commented on YARN-955:


Thanks [~zjshen] for review.

bq. Please add the corresponding configs in yarn-default.xml as well.
Done

Thanks,
Mayank

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol


 [ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-955:
---

Attachment: YARN-955-5.patch

Attaching latest patch.

Thanks,
Mayank

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826037#comment-13826037
 ] 

Hadoop QA commented on YARN-674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614512/YARN-674.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2479//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/2479//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2479//console

This message is automatically generated.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch, YARN-674.9.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-709) verify that new jobs submitted with old RM delegation tokens after RM restart are accepted

2013-11-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826039#comment-13826039
 ] 

Hudson commented on YARN-709:
-

SUCCESS: Integrated in Hadoop-trunk-Commit #4754 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4754/])
YARN-709. Added tests to verify validity of delegation tokens and logging of 
appsummary after RM restart. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1543269)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


 verify that new jobs submitted with old RM delegation tokens after RM restart 
 are accepted
 --

 Key: YARN-709
 URL: https://issues.apache.org/jira/browse/YARN-709
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.3.0

 Attachments: YARN-709.1.patch


 More elaborate test for restoring RM delegation tokens on RM restart.
 New jobs with old RM delegation tokens should be accepted by new RM as long 
 as the token is still valid



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-754) Allow for black-listing resources in FS


 [ 
https://issues.apache.org/jira/browse/YARN-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-754.
-

Resolution: Duplicate

Closing as duplicate of YARN-1333

 Allow for black-listing resources in FS
 ---

 Key: YARN-754
 URL: https://issues.apache.org/jira/browse/YARN-754
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: scheduler
Affects Versions: 2.0.4-alpha
Reporter: Sandy Ryza
Assignee: Sandy Ryza





--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Resolved] (YARN-384) add virtual cores info to the queue metrics


 [ 
https://issues.apache.org/jira/browse/YARN-384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza resolved YARN-384.
-

Resolution: Duplicate

Closing as duplicate of YARN-598

 add virtual cores info to the queue metrics
 ---

 Key: YARN-384
 URL: https://issues.apache.org/jira/browse/YARN-384
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.0.3-alpha
Reporter: Thomas Graves

 Now that we have cores as a resource in the scheduler we should add metrics 
 so we can use usage - allocated, requested, whatever else might apply.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826055#comment-13826055
 ] 

Omkar Vinit Joshi commented on YARN-674:


[~bikassaha] I completely missed your comment. What you are saying will not 
occur.
{code}
pool.allowCoreThreadTimeOut(true);
{code}
this should time out core threads if there are any lying around.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch, YARN-674.9.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol


[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826062#comment-13826062
 ] 

Hadoop QA commented on YARN-955:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614525/YARN-955-5.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2480//console

This message is automatically generated.

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable


[ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826066#comment-13826066
 ] 

Omkar Vinit Joshi commented on YARN-674:


I think we should just ignore the find bug warning.. it is never going to 
occur...plus TestRMRestart is passing locally... there must be some race 
condition here not related to this patch.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.2.patch, YARN-674.3.patch, 
 YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, YARN-674.6.patch, 
 YARN-674.7.patch, YARN-674.8.patch, YARN-674.9.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-1403) Separate out configuration loading from QueueManager in the Fair Scheduler


[ 
https://issues.apache.org/jira/browse/YARN-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826078#comment-13826078
 ] 

Hadoop QA commented on YARN-1403:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12614522/YARN-1403-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2481//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2481//console

This message is automatically generated.

 Separate out configuration loading from QueueManager in the Fair Scheduler
 --

 Key: YARN-1403
 URL: https://issues.apache.org/jira/browse/YARN-1403
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: YARN-1403-1.patch, YARN-1403-2.patch, YARN-1403.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable

2013-11-18 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-674:
-

Attachment: YARN-674.10.patch

+1 for the latest patch, save for the findbugs issue.

Trying to fix it myself.

 Slow or failing DelegationToken renewals on submission itself make RM 
 unavailable
 -

 Key: YARN-674
 URL: https://issues.apache.org/jira/browse/YARN-674
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Omkar Vinit Joshi
 Attachments: YARN-674.1.patch, YARN-674.10.patch, YARN-674.2.patch, 
 YARN-674.3.patch, YARN-674.4.patch, YARN-674.5.patch, YARN-674.5.patch, 
 YARN-674.6.patch, YARN-674.7.patch, YARN-674.8.patch, YARN-674.9.patch


 This was caused by YARN-280. A slow or a down NameNode for will make it look 
 like RM is unavailable as it may run out of RPC handlers due to blocked 
 client submissions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-955) [YARN-321] Implementation of ApplicationHistoryProtocol

2013-11-18 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13826193#comment-13826193
 ] 

Zhijie Shen commented on YARN-955:
--

+1, LGMT

 [YARN-321] Implementation of ApplicationHistoryProtocol
 ---

 Key: YARN-955
 URL: https://issues.apache.org/jira/browse/YARN-955
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Mayank Bansal
 Attachments: YARN-955-1.patch, YARN-955-2.patch, YARN-955-3.patch, 
 YARN-955-4.patch, YARN-955-5.patch






--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (YARN-674) Slow or failing DelegationToken renewals on submission itself make RM unavailable