date:20151206

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043844#comment-15043844
 ] 

Naganarasimha G R commented on YARN-4411:
-

Hi [~yarntime], 
This issue is not related to the modificaitons in your patch. There are already 
jira raised for these reported bugs YARN-4306 and YARN-4318.
but apart from these issues, approach in ur patch seems to be fine.

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043940#comment-15043940
 ] 

Sunil G commented on YARN-4416:
---

A typo
Almost all api's exposed from LeafQueue is used with Lock from Queue ==> Almost 
all api's exposed from *AbstractComparatorOrderingPolicy* is used with Lock 
from Queue

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043921#comment-15043921
 ] 

Sunil G commented on YARN-4293:
---

Hi [~Naganarasimha Garla]
Extremely sorry for the mixed up. I was trying to have the CLI up here, and 
automatically did NodeReport since we needed that resource info. I feel it can 
be marked up as dup here if its fine.

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043909#comment-15043909
 ] 

Sunil G commented on YARN-4416:
---

bq.i have added locks for the access of schedulableEntities in 
AbstractComparatorOrderingPolicy but not completely sure of the modifications 
as there already synchronization on entitiesToReorder. So would like 
additional(/focused) review for this part in particular

AbstractComparatorOrderingPolicy or OrderingPolicy is accessed under the lock 
from LeafQueue. This dependency does exists now. I feel, its better to access 
this via LeafQueue lock.

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043932#comment-15043932
 ] 

Sunil G commented on YARN-4416:
---

Sorry, I was not  very clear in my earlier comments.

Almost all api's exposed from LeafQueue is used with Lock from Queue. Hence 
with this new lock, we are getting a hierarchy. Is this intentional.?
Because we are going to have a new lock in a major code path.

Also In LeafQueue#assignContainers
{code}
for (Iterator assignmentIterator =
orderingPolicy.getAssignmentIterator(); assignmentIterator.hasNext();) {
  FiCaSchedulerApp application = assignmentIterator.next();

{code}

we access the iterator from ordering policy under LeafQueue lock, so I could 
see that, now we have some methods in LeafQueue which is removed with LeafQueue 
lock and directly used only new lock from OrderingPolicy. So we need to 
slightly careful here as we should ensure we do not delete any item w/o 
LeafQueue lock. (we are now doing under LeafQueue lock, hence no issues as of 
now)

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043953#comment-15043953
 ] 

Naganarasimha G R commented on YARN-4304:
-

[~sunilg],
Tested the latest patch on the trunk and seems to work fine and not facing the 
Web ui rendering issue (NPE) which was coming in the initial patch. WRT 
implementation i feel [~wangda]s comment  ??ResourcesInfo's constructor 
shouldn't relate to LeafQueue and considerAMUsage, it should simply copy fields 
from ResourceUsage.?? is valid and even if required may be we can extend the 
ResourceInfo for the LeafQueue and have specific fields there.


> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043941#comment-15043941
 ] 

Naganarasimha G R commented on YARN-4416:
-

[~sunilg],
bq. Hence with this new lock, we are getting a hierarchy. Is this intentional.?
Yes Sunil, even i was skeptical about it, but went ahead with [~wangda]'s 
[suggestion|https://issues.apache.org/jira/browse/YARN-4416?focusedCommentId=15038560=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15038560]
 as there were similar read write locks held in queueCapacity, resource-usage & 
some methods were already updating them without locks on LeafQueue. Further was 
of the opinion that Ordering policy should not be dependent on LeafQueue for 
ensuring multithreaded consistency as its independent entity and can be used 
else where.

bq. we access the iterator from ordering policy under LeafQueue lock, so I 
could see that, now we have some methods in LeafQueue which is removed with 
LeafQueue lock and directly used only new lock from OrderingPolicy.
Still all the methods which are modifying the Ordering policy is done holding 
lock on LeafQueue and if in future if any other place they modify they need to 
ensure first lock on Leaf queue is held. Also TreeSet iterator failsfast when 
the underlying set gets modified

But Anyway need to evaluate the impact on the performance. Planning to run SLS 
with and without these changes to validate it.

Further IMO i think we could have read write lock in LeafQueue which would 
better avoid all Synchronized locks on LeafQueue for the getter(/reads) in the 
leaf queue. Thoughts ?


> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043919#comment-15043919
 ] 

Naganarasimha G R commented on YARN-4293:
-

hi [~sunilg],
Seems like you have handled scope of YARN-4291 in this jira itself, so shall i 
close YARN-4291 jira?   

> ResourceUtilization should be a part of yarn node CLI
> -
>
> Key: YARN-4293
> URL: https://issues.apache.org/jira/browse/YARN-4293
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Sunil G
> Attachments: 0001-YARN-4293.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-4291) ResourceUtilization should be a part of NodeReport API.

2015-12-06 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R resolved YARN-4291.
-
Resolution: Done

Scope of this jira has been handled as part of YARN-4293... Hence closing this 
issue!

> ResourceUtilization should be a part of NodeReport API.
> ---
>
> Key: YARN-4291
> URL: https://issues.apache.org/jira/browse/YARN-4291
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

2015-12-06 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043971#comment-15043971
 ] 

Steve Loughran commented on YARN-4072:
--

+1

> ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager 
> to support JvmPauseMonitor as a service
> 
>
> Key: YARN-4072
> URL: https://issues.apache.org/jira/browse/YARN-4072
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Minor
> Attachments: 0001-YARN-4072.patch, 0002-YARN-4072.patch, 
> HADOOP-12321-005-aggregated.patch, HADOOP-12407-001.patch
>
>
> As JvmPauseMonitor is made as an AbstractService, subsequent method changes 
> are needed in all places which uses the monitor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-06 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3367:

Attachment: YARN-3367-feature-YARN-2928.v1.002.patch

Thanks for the clarification [~djp],
So IIUC from your reply, can i take ur answer to my query ??Is it req to ensure 
all the async events are also pushed along with the current sync event?? as 
yes. ?

Also can you take a look at other 4 queries which i had 
[posted|https://issues.apache.org/jira/browse/YARN-3367?focusedCommentId=14732065=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14732065]
 initially ?

bq. Btw, Cancel the patch as it is out of sync with new branch.
Not sure why it failed, but was able to successfully apply in my local branch! 
recreating the patch and uploading it again.


[~gtCarrera],
bq. I looked at the patch. One general comment is that, the logic of 
TimelineEntityAsyncDispatcher is pretty similar to AsyncDispatcher. Since the 
code segments that handling concurrency is normally considered as non-trivial, 
maybe we should refactor AsycnDispatcher's code and reuse it, rather than 
follow the logic here?
There are 2 aspects to consider
* Basically we would require some parameterized generic class here so that the 
queue can be not just be holding {{Event}} instead any object. But the problem 
is we are doing this in a branch and we introduce it, then all the places where 
we are using AsyncDispatcher might require change which could be cumbersome to 
merge as changes would be at many places!
* Also based on the [~djp]'s comment need to add additional logic to ensure 
that sync puts are blocked till all events till sync events are pushed. All 
these needs to be handled in the AsyncDispatcher

Considering this my opinion would, *not*  to modify the AsyncDispatcher, 
Thoughts?


> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-06 Thread yarntime (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043881#comment-15043881
 ] 

yarntime commented on YARN-4411:


ok, I get it, thank you for your help.

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043896#comment-15043896
 ] 

Hadoop QA commented on YARN-2885:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 19 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
13s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 1s 
{color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
30s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s 
{color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
10s {color} | {color:green} yarn-2877 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
39s {color} | {color:green} yarn-2877 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in yarn-2877 failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 31s 
{color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 49s 
{color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK 
v1.8.0_66 generated 1 new issues (was 14, now 14). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 6s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_85 with JDK v1.7.0_85 
generated 1 new issues (was 15, now 15). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 17s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} Patch generated 128 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 555, now 678). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
9s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 1s 
{color} | {color:red} The patch has 18 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 12s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 introduced 3 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 40s 
{color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_85

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-06 Thread yarntime (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043859#comment-15043859
 ] 

yarntime commented on YARN-4411:


Hi [~Naganarasimha]
thanks for your reply, and I want to know is there any way to avoid these 
errors when I submit the patch?
thank you very much.


> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-06 Thread yarntime (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043835#comment-15043835
 ] 

yarntime commented on YARN-4411:


Hi [~djp],  Would you please help me with this problem? Thank you very much.
I submited a simple patch which replace 
YarnApplicationAttemptState.valueOf(this.getState().toString())
with
this.createApplicationAttemptState(),
but it can not pass the unit testes in jenkins.
the error message is like this:
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: "278445b1a8f3":8030; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
at org.apache.hadoop.ipc.Client$Connection.(Client.java:413)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1424)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:281)

testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization)
  Time elapsed: 2.68 sec  <<< ERROR!
java.net.UnknownHostException: Invalid host name: local host is: (unknown); 
destination host is: "278445b1a8f3":8030; java.net.UnknownHostException; For 
more details see:  http://wiki.apache.org/hadoop/UnknownHost
at org.apache.hadoop.ipc.Client$Connection.(Client.java:413)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1424)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:281)

I'm looking forward your response,Thanks.

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043878#comment-15043878
 ] 

Naganarasimha G R commented on YARN-4411:
-

you cant avoid, as its caused by existing code. 

> ResourceManager IllegalArgumentException error
> --
>
> Key: YARN-4411
> URL: https://issues.apache.org/jira/browse/YARN-4411
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: yarntime
>Assignee: yarntime
> Attachments: YARN-4411.001.patch
>
>
> in version 2.7.1, line 1914  may cause IllegalArgumentException in 
> RMAppAttemptImpl:
>   YarnApplicationAttemptState.valueOf(this.getState().toString())
> cause by this.getState() returns type RMAppAttemptState which may not be 
> converted to YarnApplicationAttemptState.
> {noformat}
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING
> at java.lang.Enum.valueOf(Enum.java:236)
> at 
> org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355)
> at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044307#comment-15044307
 ] 

Sunil G commented on YARN-4416:
---

I also agree that we need to make ordering policy independent. But a fail fast 
iterator will also be pblm as we have an open loophole to change some contents 
in SchedulableEntity. A discussion took place while doing priority with Jian on 
same line. And we dropped the plan to have locks inside ordering policy due to 
tight coupling with leafqueue. Looping [~jianhe] also to the thread. 

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044081#comment-15044081
 ] 

Hadoop QA commented on YARN-3367:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
59s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 33s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
34s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
30s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
feature-YARN-2928 has 3 extant Findbugs warnings. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s 
{color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
0s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 11s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 6 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 52, now 58). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
37s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 29s 
{color} | {color:red} hadoop-yarn-common in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 5s {color} | 
{color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 13s {color} 
| {color:red} hadoop-yarn-common in the patch failed with JDK v1.7.0_85. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 31s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} |

[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

2015-12-06 Thread Lin Yiqun (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044261#comment-15044261
 ] 

Lin Yiqun commented on YARN-4381:
-

[~djp], do you have some time to review my patch or what else can I do for this 
jira ?

> Add container launchEvent and container localizeFailed metrics in container
> ---
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
> Attachments: YARN-4381.001.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's 
> {{NodeManagerMetrics#containersLaunched}} is not actually means the container 
> succeed launched times.Because in some time, it will be failed when receiving 
> the killing command or happening container-localizationFailed.This will lead 
> to a failed container.But now,this counter value will be increased in these 
> code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
>   credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
>   NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
>   throw RPCUtil.getRemoteException("Container " + containerIdStr
>   + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
>   if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, 
> context);
> if (null == context.getApplications().putIfAbsent(applicationID,
>   application)) {
>   LOG.info("Creating a new application reference for app " + 
> applicationID);
>   LogAggregationContext logAggregationContext =
>   containerTokenIdentifier.getLogAggregationContext();
>   Map appAcls =
>   container.getLaunchContext().getApplicationACLs();
>   context.getNMStateStore().storeApplication(applicationID,
>   buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
>   dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
>   logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
>   new ApplicationContainerInitEvent(container));
> 
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
>   containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
>   "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a 
> container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
>   } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
>   }
> {code}
> In addition, we are lack of localzationFailed metric in container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

2015-12-06 Thread Tsuyoshi Ozawa (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044258#comment-15044258
 ] 

Tsuyoshi Ozawa commented on YARN-4348:
--

[~jianhe] could you take a look?

> ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding 
> blocking ZK's event thread
> --
>
> Key: YARN-4348
> URL: https://issues.apache.org/jira/browse/YARN-4348
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.2, 2.6.2
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Blocker
> Attachments: YARN-4348-branch-2.7.002.patch, 
> YARN-4348-branch-2.7.003.patch, YARN-4348-branch-2.7.004.patch, 
> YARN-4348.001.patch, YARN-4348.001.patch, log.txt
>
>
> Jian mentioned that the current internal ZK configuration of ZKRMStateStore 
> can cause a following situation:
> 1. syncInternal timeouts, 
> 2. but sync succeeded later on.
> We should use zkResyncWaitTime as the timeout value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-06 Thread Sunil G (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044310#comment-15044310
 ] 

Sunil G commented on YARN-4304:
---

Thank you [~Naganarasimha]  for helping in verifying the patch. Yes, I will be 
handling as the suggestion from Wangda in another ticket and has provided patch 
there. Once that's resolved,  we ll remove the leafqueue dependency here and 
only will be dependent on ResourceUsage as you suggested. Thank you. 

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4426) unhealthy disk makes NM LOST

2015-12-06 Thread sandflee (JIRA)

sandflee created YARN-4426:
--

 Summary: unhealthy disk makes NM LOST
 Key: YARN-4426
 URL: https://issues.apache.org/jira/browse/YARN-4426
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: sandflee


nm are hanged because mkdir hangs in DiskHealthMonitor-Timer, and 
nodeStatusUpdater couldn't get sync lock in getNodeStatus


"DiskHealthMonitor-Timer" daemon prio=10 tid=0x7f4b3d867000 nid=0x50c8 
runnable [0x7f4b27ef9000]
   java.lang.Thread.State: RUNNABLE
at java.io.UnixFileSystem.createDirectory(Native Method)
at java.io.File.mkdir(File.java:1310)
at 
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsCheck(DiskChecker.java:67)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:90)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.verifyDirUsingMkdir(DirectoryCollection.java:338)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.testDirs(DirectoryCollection.java:310)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.checkDirs(DirectoryCollection.java:230)
- locked <0xf8970408> (a 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.checkDirs(LocalDirsHandlerService.java:361)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.access$400(LocalDirsHandlerService.java:51)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService$MonitoringTimerTask.run(LocalDirsHandlerService.java:123)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)


"Node Status Updater" prio=10 tid=0x7f4b3cd6d800 nid=0x4af5 waiting for 
monitor entry [0x7f4b1c141000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getFailedDirs(DirectoryCollection.java:170)
- waiting to lock <0xf8970408> (a 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getDisksHealthReport(LocalDirsHandlerService.java:259)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeHealthCheckerService.getHealthReport(NodeHealthCheckerService.java:58)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.getNodeStatus(NodeStatusUpdaterImpl.java:365)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.access$100(NodeStatusUpdaterImpl.java:77)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl$1.run(NodeStatusUpdaterImpl.java:588)
at java.lang.Thread.run(Thread.java:745)


"AsyncDispatcher event handler" prio=10 tid=0x7f4b3da24000 nid=0x50d9 
waiting for monitor entry [0x7f4b245b6000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection.getGoodDirs(DirectoryCollection.java:163)
- waiting to lock <0xf8970408> (a 
org.apache.hadoop.yarn.server.nodemanager.DirectoryCollection)
at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalDirsForCleanup(LocalDirsHandlerService.java:229)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handleCleanupContainerResources(ResourceLocalizationService.java:497)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:395)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.handle(ResourceLocalizationService.java:134)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:191)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:124)
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4424) YARN CLI command hangs

2015-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044196#comment-15044196
 ] 

Hadoop QA commented on YARN-4424:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
16s {color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
36s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
22s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 23s 
{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s 
{color} | {color:green} the patch passed with JDK v1.7.0_85 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 48s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 35s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_85. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
27s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 149m 13s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_85 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

2015-12-06 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044316#comment-15044316
 ] 

Naganarasimha G R commented on YARN-4416:
-

[~sunilg],
Hmm true, but any other way to avoid sync locks for the get API's ?  I feel 
thats really not good its like web ui,CLI, REST everybody access Queue to get 
information and if any problem else where Main Scheduler Thread can get stuck. 
Also we can have unexpected deadlocks for read calls like one in the attached 
stack trace. 
Can Read/Write locks in the leaf queue be an option ?

> Deadlock due to synchronised get Methods in AbstractCSQueue
> ---
>
> Key: YARN-4416
> URL: https://issues.apache.org/jira/browse/YARN-4416
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-4416.v1.001.patch, YARN-4416.v1.002.patch, 
> deadlock.log
>
>
> While debugging in eclipse came across a scenario where in i had to get to 
> know the name of the queue but every time i tried to see the queue it was 
> getting hung. On seeing the stack realized there was a deadlock but on 
> analysis found out that it was only due to *queue.toString()* during 
> debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized.
> Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized 
> and better be handled through read and write locks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4424) YARN CLI command hangs

2015-12-06 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4424:
--
Attachment: YARN-4424.1.patch

The patch removes the read lock in RMAppImpl#getFinalApplicationStatus as I 
think that's not required. 

> YARN CLI command hangs
> --
>
> Key: YARN-4424
> URL: https://issues.apache.org/jira/browse/YARN-4424
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-4424.1.patch
>
>
> {code}
> yarn@XXX:/mnt/hadoopqe$ /usr/hdp/current/hadoop-yarn-client/bin/yarn 
> application -list -appStates NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING
> 15/12/04 21:59:54 INFO impl.TimelineClientImpl: Timeline service address: 
> http://XXX:8188/ws/v1/timeline/
> 15/12/04 21:59:54 INFO client.RMProxy: Connecting to ResourceManager at 
> XXX/0.0.0.0:8050
> 15/12/04 21:59:55 INFO client.AHSProxy: Connecting to Application History 
> server at XXX/0.0.0.0:10200
> {code}
> {code:title=RM log}
> 2015-12-04 21:59:19,744 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 237000
> 2015-12-04 22:00:50,945 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 238000
> 2015-12-04 22:02:22,416 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 239000
> 2015-12-04 22:03:53,593 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 24
> 2015-12-04 22:05:24,856 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 241000
> 2015-12-04 22:06:56,235 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 242000
> 2015-12-04 22:08:27,510 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 243000
> 2015-12-04 22:09:58,786 INFO  event.AsyncDispatcher 
> (AsyncDispatcher.java:handle(243)) - Size of event-queue is 244000
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4424) YARN CLI command hangs

2015-12-06 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044141#comment-15044141
 ] 

Jian He commented on YARN-4424:
---

This is a similar problem to YARN-2594
Thread 1 
{code}
Thread 53785: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Interpreted frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
@bci=1, line=834 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) 
@bci=83, line=964 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(int) 
@bci=10, line=1282 (Interpreted frame)
 - java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock() @bci=5, 
line=731 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getFinalApplicationStatus()
 @bci=4, line=478 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptFinished(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttempt,
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptState, 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMApp, long) @bci=45, 
line=162 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$BaseFinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl,
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) 
@bci=288, line=1300 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl,
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) 
@bci=9, line=1493 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalTransition.transition(java.lang.Object,
 java.lang.Object) @bci=9, line=1480 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl,
 org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent) 
@bci=24, line=1213 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$FinalStateSavedTransition.transition(java.lang.Object,
 java.lang.Object) @bci=9, line=1205 (Interpreted frame)
 - 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(java.lang.Object,
 java.lang.Enum, java.lang.Object, java.lang.Enum) @bci=6, line=385 
(Interpreted frame)
 - 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(java.lang.Object, 
java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=45, line=302 
(Interpreted frame)
 - 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(org.apache.hadoop.yarn.state.StateMachineFactory,
 java.lang.Object, java.lang.Enum, java.lang.Enum, java.lang.Object) @bci=6, 
line=46 (Interpreted frame)
 - 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(java.lang.Enum,
 java.lang.Object) @bci=15, line=448 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent)
 @bci=65, line=784 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(org.apache.hadoop.yarn.event.Event)
 @bci=5, line=106 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEvent)
 @bci=53, line=815 (Interpreted frame)
 - 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(org.apache.hadoop.yarn.event.Event)
 @bci=5, line=796 (Interpreted frame)
 - 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(org.apache.hadoop.yarn.event.Event)
 @bci=88, line=183 (Interpreted frame)
 - org.apache.hadoop.yarn.event.AsyncDispatcher$1.run() @bci=140, line=109 
(Interpreted frame)
{code}
Thread 2
{code}
Thread 25723: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may 
be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
line=186 (Interpreted frame)
 - 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
@bci=1, line=834 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(int) 
@bci=83, line=964

[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-06 Thread Arun Suresh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2885:
--
Attachment: YARN-2885-yarn-2877.full-2.patch

Updating patch:

# The earlier patch had changed the containerId generation scheme such that 
containers generated by the RM would be even numbers and those by the NM would 
be odd. Unfortunately, that requires too many test case changes. The latest 
patch uses a new scheme where containerIds generated by RM remains as is.. but 
those generated by NM would be negative (decr by -1)
# Added more test cases to the LocalScheduler

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full.patch, YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-4309) Add debug information to application logs when a container fails

2015-12-06 Thread Varun Vasudev (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4309:

Attachment: YARN-4309.007.patch

Uploaded a new patch with clarifications on following symlinks in the comments 
and yarn-default.xml .

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-06 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044404#comment-15044404
 ] 

Wangda Tan commented on YARN-4309:
--

[~vvasudev],
Thanks for reply, make sense to me.

Few comments:
- Could you make sure container process will be launched even if copy script or 
list folder command fails?
- Could you add echo command (something like echo "Printing container launch 
debug info...") to container_launch.sh? (After following "if")
{code}
362 if (getConf() != null && getConf().getBoolean(
363 YarnConfiguration.NM_LOG_CONTAINER_DEBUG_INFO,
364 YarnConfiguration.DEFAULT_NM_LOG_CONTAINER_DEBUG_INFO)) {
{code}
- Add a test to verify log aggregation result contains such debugging output?
- Could you upload a sample container_launch.sh for easier review? 

> Add debug information to application logs when a container fails
> 
>
> Key: YARN-4309
> URL: https://issues.apache.org/jira/browse/YARN-4309
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4309.001.patch, YARN-4309.002.patch, 
> YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch, 
> YARN-4309.006.patch, YARN-4309.007.patch
>
>
> Sometimes when a container fails, it can be pretty hard to figure out why it 
> failed.
> My proposal is that if a container fails, we collect information about the 
> container local dir and dump it into the container log dir. Ideally, I'd like 
> to tar up the directory entirely, but I'm not sure of the security and space 
> implications of such a approach. At the very least, we can list all the files 
> in the container local dir, and dump the contents of launch_container.sh(into 
> the container log dir).
> When log aggregation occurs, all this information will automatically get 
> collected and make debugging such failures much easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

2015-12-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044406#comment-15044406
 ] 

Hadoop QA commented on YARN-4309:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
34s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 52s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 9s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
38s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 44s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 3s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 358, now 359). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
37s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
50s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 32s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 6s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 7s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
|

[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-06 Thread Varun Vasudev (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044361#comment-15044361
 ] 

Varun Vasudev commented on YARN-3542:
-

All the existing configurations will continue to work as is. The patch adds a 
new configuration -
{code}
yarn.nodemanager.resource.cpu.enabled
{code}

which if set to true will create the cpu handler as part of the resource 
handler chain. None of the other configurations change. If both 
yarn.nodemanager.resource.cpu.enabled and 
yarn.nodemanager.linux-container-executor.resources-handler.class are set, you 
end up in a situation where both objects end up modifying the same file.

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Sidharta Seethana
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

2015-12-06 Thread Xianyin Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044498#comment-15044498
 ] 

Xianyin Xin commented on YARN-4415:
---

Sorry for the late, [~Naganarasimha]. 
I don't know i understand correctly, so pls correct me if i'm wrong. Now 
there're two cases, 1), if we have set the access-labels for a queue in xml, 
and 2), we didnt set the access-labels for a queue. For case 1), the 
access-labels and the configured capacities (0 for capacity and 100 max by 
default) are imported, and for case 2), the access-labels of the queue is 
inherited from its parent, but the capacities of the labels are 0 since 
{{setupConfigurableCapacities()}} only considers the configured access-labels 
in xml.
{code}
this.accessibleLabels =
csContext.getConfiguration().getAccessibleNodeLabels(getQueuePath());
this.defaultLabelExpression = csContext.getConfiguration()
.getDefaultNodeLabelExpression(getQueuePath());

// inherit from parent if labels not set
if (this.accessibleLabels == null && parent != null) {
  this.accessibleLabels = parent.getAccessibleNodeLabels();
}

// inherit from parent if labels not set
if (this.defaultLabelExpression == null && parent != null
&& this.accessibleLabels.containsAll(parent.getAccessibleNodeLabels())) 
{
  this.defaultLabelExpression = parent.getDefaultNodeLabelExpression();
}

// After we setup labels, we can setup capacities
setupConfigurableCapacities();
{code}

This would cause confusion because the access-labels inherited from parent have 
0 max capacities. If the case is true, i agree that the inherited access-labels 
has 100 max capacities by default.

But for the two scenarios in the descrition, i feel the final result is 
reasonable because you didnt set the access-labels for the queue and its parent 
doesn't have the access-labels also, so the label is not accessable explicitly 
by the queue. But the info that the web ui shows is wrong if the above analysis 
is right. i think the cause is from follow sentence in 
{QueueCapacitiesInfo.java},

{code}
if (maxCapacity < CapacitySchedulerQueueInfo.EPSILON || maxCapacity > 1f)
maxCapacity = 1f;
{code}
where it set the {{maxCapacity}} to 1 for case {{maxCapacity == 0}} which is 
just the case 2) above.

cc [~leftnoteasy].

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> 
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.2
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Commented] (YARN-4293) ResourceUtilization should be a part of yarn node CLI

[jira] [Resolved] (YARN-4291) ResourceUtilization should be a part of NodeReport API.

[jira] [Commented] (YARN-4072) ApplicationHistoryServer, WebAppProxyServer, NodeManager and ResourceManager to support JvmPauseMonitor as a service

[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

[jira] [Commented] (YARN-4381) Add container launchEvent and container localizeFailed metrics in container

[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal shouldn't wait for sync completion for avoiding blocking ZK's event thread

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

[jira] [Created] (YARN-4426) unhealthy disk makes NM LOST

[jira] [Commented] (YARN-4424) YARN CLI command hangs

[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue

[jira] [Updated] (YARN-4424) YARN CLI command hangs

[jira] [Commented] (YARN-4424) YARN CLI command hangs

[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

[jira] [Updated] (YARN-4309) Add debug information to application logs when a container fails

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails

[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned

32 matches

Site Navigation

Mail list logo

Footer information