date:20141217

[jira] [Updated] (YARN-2973) Capacity scheduler configuration ACLs not work.

2014-12-17 Thread Jimmy Song (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Song updated YARN-2973:
-
Issue Type: Improvement  (was: Bug)

> Capacity scheduler configuration ACLs not work.
> ---
>
> Key: YARN-2973
> URL: https://issues.apache.org/jira/browse/YARN-2973
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: ubuntu 12.04, cloudera manager, cdh5.2.1
>Reporter: Jimmy Song
>Assignee: Rohith
>  Labels: acl, capacity-scheduler, yarn
>
> I follow this page to configure yarn: 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
>  
> I configured YARN to use capacity scheduler in yarn-site.xml with 
> yarn.resourcemanager.scheduler.class for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
>  Then modified capacity-scheduler.xml,
> ___
> 
> 
>   
> yarn.scheduler.capacity.root.queues
> default,extract,report,tool
>   
>   
> yarn.scheduler.capacity.root.state
> RUNNING
>   
>   
> yarn.scheduler.capacity.root.default.acl_submit_applications
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.acl_administer_queue
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.extract.acl_submit_applications
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.acl_administer_queue
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.capacity
> 15
>   
>   
> yarn.scheduler.capacity.root.report.acl_submit_applications
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.acl_administer_queue
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.tool.acl_submit_applications
>  
>   
>   
> yarn.scheduler.capacity.root.tool.acl_administer_queue
>  
>   
>   
> yarn.scheduler.capacity.root.tool.capacity
> 15
>   
> 
> ___
> I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
> applications to every queue. The queue acl does't work! And the queue used 
> capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2973) Capacity scheduler configuration ACLs not work.

2014-12-17 Thread Jimmy Song (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249642#comment-14249642
 ] 

Jimmy Song commented on YARN-2973:
--

The child queue inherits the ACL of  root queue In the capacity scheduler 
configuration. The child queues' ACL is the UNION of the root queue. So the 
configurations of child queue will not work if the root queue not configured, 
in other word, leave the root queue in default configuration which allow any 
user to submit applications.

> Capacity scheduler configuration ACLs not work.
> ---
>
> Key: YARN-2973
> URL: https://issues.apache.org/jira/browse/YARN-2973
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: ubuntu 12.04, cloudera manager, cdh5.2.1
>Reporter: Jimmy Song
>Assignee: Rohith
>  Labels: acl, capacity-scheduler, yarn
>
> I follow this page to configure yarn: 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
>  
> I configured YARN to use capacity scheduler in yarn-site.xml with 
> yarn.resourcemanager.scheduler.class for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
>  Then modified capacity-scheduler.xml,
> ___
> 
> 
>   
> yarn.scheduler.capacity.root.queues
> default,extract,report,tool
>   
>   
> yarn.scheduler.capacity.root.state
> RUNNING
>   
>   
> yarn.scheduler.capacity.root.default.acl_submit_applications
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.acl_administer_queue
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.extract.acl_submit_applications
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.acl_administer_queue
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.capacity
> 15
>   
>   
> yarn.scheduler.capacity.root.report.acl_submit_applications
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.acl_administer_queue
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.tool.acl_submit_applications
>  
>   
>   
> yarn.scheduler.capacity.root.tool.acl_administer_queue
>  
>   
>   
> yarn.scheduler.capacity.root.tool.capacity
> 15
>   
> 
> ___
> I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
> applications to every queue. The queue acl does't work! And the queue used 
> capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249740#comment-14249740
 ] 

Hudson commented on YARN-2762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #44 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/44/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249746#comment-14249746
 ] 

Hudson commented on YARN-2762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #778 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/778/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2974) GenericExceptionHandler to handle ErrorMessagesException by extracting messages

2014-12-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249754#comment-14249754
 ] 

Steve Loughran commented on YARN-2974:
--

The client-side response is even less useful
{code}
 Request to 
http://stevel-763.local:63771/proxy/application_1418816662063_0001/ws/v1/slider/application/live/resources
 failed with exit code 500, body length 221:
ErrorMessagesExceptioncom.sun.jersey.spi.inject.Errors$ErrorMessagesException
{code}

> GenericExceptionHandler to handle ErrorMessagesException by extracting 
> messages
> ---
>
> Key: YARN-2974
> URL: https://issues.apache.org/jira/browse/YARN-2974
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>
> The Jersey {{ErrorMessagesException}} supports a list of messages, and is how 
> jersey itself builds up errors. 
> {{GenericExceptionHandler }} doesn't have special handling here and converts 
> to a 500, discarding all the text as it does so.
> the handler should recognise the exception, log the messages and build up the 
> response text from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2974) GenericExceptionHandler to handle ErrorMessagesException by extracting messages

2014-12-17 Thread Steve Loughran (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved YARN-2974.
--
Resolution: Won't Fix

Jersey error code doesn't provide accessors to the error strings, simply 
something static to log them through the java util logger :(

> GenericExceptionHandler to handle ErrorMessagesException by extracting 
> messages
> ---
>
> Key: YARN-2974
> URL: https://issues.apache.org/jira/browse/YARN-2974
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>
> The Jersey {{ErrorMessagesException}} supports a list of messages, and is how 
> jersey itself builds up errors. 
> {{GenericExceptionHandler }} doesn't have special handling here and converts 
> to a 500, discarding all the text as it does so.
> the handler should recognise the exception, log the messages and build up the 
> response text from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2974) GenericExceptionHandler to handle ErrorMessagesException by extracting messages

2014-12-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249753#comment-14249753
 ] 

Steve Loughran commented on YARN-2974:
--

Example of the log today
{code}
 WARN  webapp.GenericExceptionHandler 
(GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
com.sun.jersey.spi.inject.Errors$ErrorMessagesException
at 
com.sun.jersey.spi.inject.Errors.processErrorMessages(Errors.java:170)
at com.sun.jersey.spi.inject.Errors.postProcess(Errors.java:136)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:199)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.getUriRules(WebApplicationImpl.java:550)
at 
com.sun.jersey.server.impl.application.WebApplicationContext.getRules(WebApplicationContext.java:243)
at 
com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:131)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1224)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}

> GenericExceptionHandler to handle ErrorMessagesException by extracting 
> messages
> ---
>
> Key: YARN-2974
> URL: https://issues.apache.org/jira/browse/YARN-2974
> Project: H

[jira] [Created] (YARN-2974) GenericExceptionHandler to handle ErrorMessagesException by extracting messages

2014-12-17 Thread Steve Loughran (JIRA)

Steve Loughran created YARN-2974:


 Summary: GenericExceptionHandler to handle ErrorMessagesException 
by extracting messages
 Key: YARN-2974
 URL: https://issues.apache.org/jira/browse/YARN-2974
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: webapp
Affects Versions: 2.6.0
Reporter: Steve Loughran


The Jersey {{ErrorMessagesException}} supports a list of messages, and is how 
jersey itself builds up errors. 

{{GenericExceptionHandler }} doesn't have special handling here and converts to 
a 500, discarding all the text as it does so.

the handler should recognise the exception, log the messages and build up the 
response text from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2974) GenericExceptionHandler to handle ErrorMessagesException by extracting messages

2014-12-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249798#comment-14249798
 ] 

Steve Loughran commented on YARN-2974:
--

Note that the output does end up on stdout/stderr; its just not being picked up 
by log4j, and we can't extract meaningful text to send over to callers like 
test runners

> GenericExceptionHandler to handle ErrorMessagesException by extracting 
> messages
> ---
>
> Key: YARN-2974
> URL: https://issues.apache.org/jira/browse/YARN-2974
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>
> The Jersey {{ErrorMessagesException}} supports a list of messages, and is how 
> jersey itself builds up errors. 
> {{GenericExceptionHandler }} doesn't have special handling here and converts 
> to a 500, discarding all the text as it does so.
> the handler should recognise the exception, log the messages and build up the 
> response text from it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249900#comment-14249900
 ] 

Hudson commented on YARN-2762:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1976 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1976/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249910#comment-14249910
 ] 

Hudson commented on YARN-2762:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #41 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/41/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249953#comment-14249953
 ] 

Hudson commented on YARN-2762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #45 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/45/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250008#comment-14250008
 ] 

Hudson commented on YARN-2762:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1995 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1995/])
YARN-2762. Fixed RMAdminCLI to trim and check node-label related arguments 
before sending to RM. Contributed by Rohith Sharmaks (jianhe: rev 
c65f1b382ec5ec93dccf459dbf8b2c93c3e150ab)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java


> RMAdminCLI node-labels-related args should be trimmed and checked before 
> sending to RM
> --
>
> Key: YARN-2762
> URL: https://issues.apache.org/jira/browse/YARN-2762
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.2.patch, 
> YARN-2762.3.patch, YARN-2762.4.patch, YARN-2762.5.patch, YARN-2762.6.patch, 
> YARN-2762.7.patch, YARN-2762.patch
>
>
> All NodeLabel args validation's are done at server side. The same can be done 
> at RMAdminCLI so that unnecessary RPC calls can be avoided.
> And for the input such as "x,y,,z,", no need to add empty string instead can 
> be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250116#comment-14250116
 ] 

Junping Du commented on YARN-2972:
--

Nice catch, [~jlowe]! 
+1. Patch looks good. Will commit it shortly.

> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2944:
---
Attachment: YARN-2944-trunk-v3.patch

[~kasha] V3 patch attached.

1. Added BaseSCMStoreTest with a zero-arg test case.
2. Added null check in InMemorySCMStore#serviceStop.

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2944:
---
Attachment: (was: YARN-2944-trunk-v3.patch)

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2944:
---
Attachment: YARN-2944-trunk-v3.patch

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2944:
---
Attachment: (was: YARN-2944-trunk-v3.patch)

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2944:
---
Attachment: YARN-2944-trunk-v3.patch

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2949) Add documentation for CGroups

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250172#comment-14250172
 ] 

Junping Du commented on YARN-2949:
--

Thanks [~vvasudev] for delivering a patch for this JIRA! 
Patch looks good to me in overall, some minor comments:
1. There are many related configuration properties for enabling/configuring 
CGroups, these properties are coming from yarn-site.xml (or yarn-default.xml) 
or container-executor.cfg. Please explain explicitly on where user should put 
these properties.
2. I like the discussion for session of CGroups and security, and it would be 
great to provide an example for configuration w/o security. (nice to have).
3. Several typos:
- typo of CGgroups", should be "CGroups"
- typo of "In our cause", should be "in our case"

> Add documentation for CGroups
> -
>
> Key: YARN-2949
> URL: https://issues.apache.org/jira/browse/YARN-2949
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation, nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: NodeManagerCgroups.html, apache-yarn-2949.0.patch
>
>
> A bunch of changes have gone into the NodeManager to allow greater use of 
> CGroups. It would be good to have a single page that documents how to setup 
> CGroups and the controls available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2618) Avoid over-allocation of disk resources

2014-12-17 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250193#comment-14250193
 ] 

Wei Yan commented on YARN-2618:
---

Thanks, [~kasha]

> Avoid over-allocation of disk resources
> ---
>
> Key: YARN-2618
> URL: https://issues.apache.org/jira/browse/YARN-2618
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wei Yan
>Assignee: Wei Yan
> Attachments: YARN-2618-1.patch, YARN-2618-2.patch, YARN-2618-3.patch, 
> YARN-2618-4.patch, YARN-2618-5.patch
>
>
> Subtask of YARN-2139. 
> This should include
> - Add API support for introducing disk I/O as the 3rd type resource.
> - NM should report this information to the RM
> - RM should consider this to avoid over-allocation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2774) shared cache service should authorize calls properly

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2774:
---
Description: 
The shared cache manager (SCM) services should authorize calls properly.

Currently, the uploader service (done in YARN-2186) does not authorize calls to 
notify the SCM on newly uploaded resource. Proper security/authorization needs 
to be done in this RPC call. Also, the use/release calls (done in YARN-2188) 
and the scmAdmin commands (done in YARN-2189) are not properly authorized. The 
SCM UI done in YARN-2203 as well.

  was:
The shared cache manager (SCM) services should authorize calls properly.

Currently, the uploader service (done in YARN-2186) does not authorize calls to 
notify the SCM on newly uploaded resource. Proper security/authorization needs 
to be done in this RPC call. Also, the use/release calls (done in YARN-2188) 
and the scmAdmin commands (done in YARN-2189) are not properly authorized.


> shared cache service should authorize calls properly
> 
>
> Key: YARN-2774
> URL: https://issues.apache.org/jira/browse/YARN-2774
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sangjin Lee
>
> The shared cache manager (SCM) services should authorize calls properly.
> Currently, the uploader service (done in YARN-2186) does not authorize calls 
> to notify the SCM on newly uploaded resource. Proper security/authorization 
> needs to be done in this RPC call. Also, the use/release calls (done in 
> YARN-2188) and the scmAdmin commands (done in YARN-2189) are not properly 
> authorized. The SCM UI done in YARN-2203 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250236#comment-14250236
 ] 

Hadoop QA commented on YARN-2944:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687776/YARN-2944-trunk-v3.patch
  against trunk revision bc21a1c.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6130//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6130//console

This message is automatically generated.

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2970:
---
Attachment: YARN-2970.patch

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false 
> (access via RM). AND PLEASE NOTE: if you configured 
> yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
> HDFS), this option will only work when the command run on the machine where 
> RM is running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-17 Thread Vijay Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay Bhat updated YARN-2230:
-
Attachment: YARN-2230.001.patch

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:420)
> .
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
> {code}
> * IMHO, such an exception should be forwarded to client. Otherwise, it is non 
> obvious to discover why a job does not make any progress.
> The same looks to be related to memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2970:
---
Attachment: (was: YARN-2970.patch)

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false 
> (access via RM). AND PLEASE NOTE: if you configured 
> yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
> HDFS), this option will only work when the command run on the machine where 
> RM is running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2970:
---
Attachment: YARN-2970.patch

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false 
> (access via RM). AND PLEASE NOTE: if you configured 
> yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
> HDFS), this option will only work when the command run on the machine where 
> RM is running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250296#comment-14250296
 ] 

Varun Saxena commented on YARN-2970:


Attaching a trivial patch. Now the help would look as under(in Non HA mode). 
Pls note I have clubbed the option {{directlyAccessNodeLabelStore}} with other 
node label options as this option would only work with other node label CLI 
commands.
{code}
yarn rmadmin [-refreshQueues] [-refreshNodes] 
[-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
[-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
[[-addToClusterNodeLabels [label1,label2,label3]] [-removeFromClusterNodeLabels 
[label1,label2,label3]] [-replaceLabelsOnNode [node1:port,label1,label2 
node2:port,label1] [-directlyAccessNodeLabelStore]] [-help [cmd]]

   -refreshQueues: Reload the queues' acls, states and scheduler specific 
properties.
ResourceManager will reload the mapred-queues configuration 
file.
   -refreshNodes: Refresh the hosts information at the ResourceManager.
   -refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups mappings
   -refreshUserToGroupsMappings: Refresh user-to-groups mappings
   -refreshAdminAcls: Refresh acls for administration of ResourceManager
   -refreshServiceAcl: Reload the service-level authorization policy file.
ResoureceManager will reload the authorization policy file.
   -getGroups [username]: Get the groups which given user belongs to.
   -help [cmd]: Displays help for the given command or all commands if none is 
specified.
   -addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): add 
to cluster node labels
   -removeFromClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
remove from cluster node labels
   -replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
replace labels on nodes
   -directlyAccessNodeLabelStore: Directly access node label store, with this 
option, all node label related operations will not connect RM. Instead, they 
will access/modify stored node labels directly. By default, it is false (access 
via RM). AND PLEASE NOTE: if you configured yarn.node-labels.fs-store.root-dir 
to a local directory (instead of NFS or HDFS), this option will only work when 
the command run on the machine where RM is running.
{code}

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false

[jira] [Commented] (YARN-2230) Fix description of yarn.scheduler.maximum-allocation-vcores in yarn-default.xml (or code)

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250340#comment-14250340
 ] 

Hadoop QA commented on YARN-2230:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687787/YARN-2230.001.patch
  against trunk revision 4281c96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 25 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6131//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6131//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6131//console

This message is automatically generated.

> Fix description of yarn.scheduler.maximum-allocation-vcores in 
> yarn-default.xml (or code)
> -
>
> Key: YARN-2230
> URL: https://issues.apache.org/jira/browse/YARN-2230
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, scheduler
>Affects Versions: 2.4.0
>Reporter: Adam Kawa
>Priority: Minor
> Attachments: YARN-2230.001.patch
>
>
> When a user requests more vcores than the allocation limit (e.g. 
> mapreduce.map.cpu.vcores  is larger than 
> yarn.scheduler.maximum-allocation-vcores), then 
> InvalidResourceRequestException is thrown - 
> https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
> {code}
> if (resReq.getCapability().getVirtualCores() < 0 ||
> resReq.getCapability().getVirtualCores() >
> maximumResource.getVirtualCores()) {
>   throw new InvalidResourceRequestException("Invalid resource request"
>   + ", requested virtual cores < 0"
>   + ", or requested virtual cores > max configured"
>   + ", requestedVirtualCores="
>   + resReq.getCapability().getVirtualCores()
>   + ", maxVirtualCores=" + maximumResource.getVirtualCores());
> }
> {code}
> According to documentation - yarn-default.xml 
> http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml,
>  the request should be capped to the allocation limit.
> {code}
>   
> The maximum allocation for every container request at the RM,
> in terms of virtual CPU cores. Requests higher than this won't take 
> effect,
> and will get capped to this value.
> yarn.scheduler.maximum-allocation-vcores
> 32
>   
> {code}
> This means that:
> * Either documentation or code should be corrected (unless this exception is 
> handled elsewhere accordingly, but it looks that it is not).
> This behavior is confusing, because when such a job (with 
> mapreduce.map.cpu.vcores is larger than 
> yarn.scheduler.maximum-allocation-vcores) is submitted, it does not make any 
> progress. The warnings/exceptions are thrown at the scheduler (RM) side e.g.
> {code}
> 2014-06-29 00:34:51,469 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
> Invalid resource ask by application appattempt_1403993411503_0002_01
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested virtual cores < 0, or requested virtual cores > 
> max configured, requestedVirtualCores=32, maxVirtualCores=3
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:237)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.validateResourceRequests(RMServerUtils.java:80)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(Application

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250344#comment-14250344
 ] 

Karthik Kambatla commented on YARN-2189:


Thanks Allen for the trunk addendum. +1 on that. Will check that in shortly to 
trunk and update my patch to branch-2. Mind taking a look? 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2189:
---
Attachment: yarn-2189-branch2.addendum-2.patch

Updated the addendum patch for branch-2. However, none of the entries are 
sorted. May be, we should follow up on the sorting in another JIRA.

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250368#comment-14250368
 ] 

Karthik Kambatla commented on YARN-2189:


BTW, committed the trunk addendum to trunk. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250367#comment-14250367
 ] 

Karthik Kambatla commented on YARN-2189:


[~aw] - can you please take a look when you get a chance? Thanks. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250375#comment-14250375
 ] 

Hudson commented on YARN-2189:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6735 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6735/])
YARN-2189. Admin service for cache manager. Addendum to sort entries (kasha: 
rev 9b4ba409c6683c52c8e931809fc47b593bb90b48)
* hadoop-yarn-project/hadoop-yarn/conf/yarn-env.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn


> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Allen Wittenauer (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250382#comment-14250382
 ] 

Allen Wittenauer commented on YARN-2189:


Usage output was fixed in HADOOP-9902 which only went to trunk.  +1 on the 
first version of your addendum.  The 2nd version needs a description rewrite in 
yarn-env.sh because, again, HADOOP-9902 is providing functionality there that 
doesn't exist in 2.x.

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250387#comment-14250387
 ] 

Hadoop QA commented on YARN-2970:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687790/YARN-2970.patch
  against trunk revision 4281c96.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client:

org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6132//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6132//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6132//console

This message is automatically generated.

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false 
> (access via RM). AND PLEASE NOTE: if you configured 
> yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
> HDFS), this option will only work when the command run on the machine where 
> RM is running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250396#comment-14250396
 ] 

Karthik Kambatla commented on YARN-2189:


Thanks Allen. Agree with your assessment. 

Let me go ahead and commit the first version then so we have a working yarn 
script. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla resolved YARN-2189.

Resolution: Fixed

Just committed the addendum to branch-2. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch, 
> yarn-2189-branch2.addendum-2.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2189:
---
Attachment: (was: yarn-2189-branch2.addendum-2.patch)

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2970) NodeLabel operations in RMAdmin CLI get missing in help command.

2014-12-17 Thread Varun Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250402#comment-14250402
 ] 

Varun Saxena commented on YARN-2970:


Test failure unrelated. Findbugs will be resolved by YARN-2937 to YARN-2940

> NodeLabel operations in RMAdmin CLI get missing in help command.
> 
>
> Key: YARN-2970
> URL: https://issues.apache.org/jira/browse/YARN-2970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Varun Saxena
>Priority: Minor
> Attachments: YARN-2970.patch
>
>
> NodeLabel operations in RMAdmin CLI get missing in help command when I am 
> debugging YARN-313, we should add them on as other cmds:
> {noformat} 
> yarn rmadmin [-refreshQueues] [-refreshNodes] [-refreshResources] 
> [-refreshSuperUserGroupsConfiguration] [-refreshUserToGroupsMappings] 
> [-refreshAdminAcls] [-refreshServiceAcl] [-getGroup [username]] 
> [-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]) [-help 
> [cmd]]
>-refreshQueues: Reload the queues' acls, states and scheduler specific 
> properties.
> ResourceManager will reload the mapred-queues configuration 
> file.
>-refreshNodes: Refresh the hosts information at the ResourceManager.
>-refreshResources: Refresh resources of NodeManagers at the 
> ResourceManager.
>-refreshSuperUserGroupsConfiguration: Refresh superuser proxy groups 
> mappings
>-refreshUserToGroupsMappings: Refresh user-to-groups mappings
>-refreshAdminAcls: Refresh acls for administration of ResourceManager
>-refreshServiceAcl: Reload the service-level authorization policy file.
> ResoureceManager will reload the authorization policy file.
>-getGroups [username]: Get the groups which given user belongs to.
>-updateNodeResource [NodeID] [MemSize] [vCores] ([OvercommitTimeout]): 
> Update resource on specific node.
>-help [cmd]: Displays help for the given command or all commands if none 
> is specified.
>-addToClusterNodeLabels [label1,label2,label3] (label splitted by ","): 
> add to cluster node labels
>-removeFromClusterNodeLabels [label1,label2,label3] (label splitted by 
> ","): remove from cluster node labels
>-replaceLabelsOnNode [node1:port,label1,label2 node2:port,label1,label2]: 
> replace labels on nodes
>-directlyAccessNodeLabelStore: Directly access node label store, with this 
> option, all node label related operations will not connect RM. Instead, they 
> will access/modify stored node labels directly. By default, it is false 
> (access via RM). AND PLEASE NOTE: if you configured 
> yarn.node-labels.fs-store.root-dir to a local directory (instead of NFS or 
> HDFS), this option will only work when the command run on the machine where 
> RM is running.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250401#comment-14250401
 ] 

Karthik Kambatla commented on YARN-2189:


Removed second addendum to avoid confusion for those who use JIRA to check out 
the latest patch. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250403#comment-14250403
 ] 

Karthik Kambatla commented on YARN-2189:


Thanks for the quick review on this, Allen. 

> Admin service for cache manager
> ---
>
> Key: YARN-2189
> URL: https://issues.apache.org/jira/browse/YARN-2189
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: YARN-1492-trunk-addendum.patch, 
> YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, YARN-2189-trunk-v3.patch, 
> YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, YARN-2189-trunk-v6.patch, 
> YARN-2189-trunk-v7.patch, yarn-2189-branch2.addendum-1.patch
>
>
> Implement the admin service for the shared cache manager. This service is 
> responsible for handling administrative commands such as manually running a 
> cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2945) FSLeafQueue should hold lock before and after sorting runnableApps in assignContainer

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250420#comment-14250420
 ] 

Karthik Kambatla commented on YARN-2945:


+1. Checking this in. 

> FSLeafQueue should hold lock before and after sorting runnableApps in 
> assignContainer
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2945) FSLeafQueue#assignContainer - document the reason for using both write and read locks

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2945:
---
Summary: FSLeafQueue#assignContainer - document the reason for using both 
write and read locks  (was: FSLeafQueue should hold lock before and after 
sorting runnableApps in assignContainer)

> FSLeafQueue#assignContainer - document the reason for using both write and 
> read locks
> -
>
> Key: YARN-2945
> URL: https://issues.apache.org/jira/browse/YARN-2945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: YARN-2945.001.patch, YARN-2945.002.patch
>
>
> After YARN-2910, assignContainer hold WriteLock while sorting and ReadLock 
> while referencing runnableApps. This can cause interrupted assignment of 
> containers regardless of the policy.
> {code}
> writeLock.lock();
> try {
>   Collections.sort(runnableApps, comparator);
> } finally {
>   writeLock.unlock();
> }
> readLock.lock();
> try {
>   for (FSAppAttempt sched : runnableApps) {
> if (SchedulerAppUtils.isBlacklisted(sched, node, LOG)) {
>   continue;
> }
> assigned = sched.assignContainer(node);
> if (!assigned.equals(Resources.none())) {
>   break;
> }
>}
> } finally {
>   readLock.unlock();
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Karthik Kambatla (JIRA)

Karthik Kambatla created YARN-2975:
--

 Summary: FSLeafQueue app lists are accessed without required locks
 Key: YARN-2975
 URL: https://issues.apache.org/jira/browse/YARN-2975
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Blocker


YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed without 
locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250453#comment-14250453
 ] 

Hudson commented on YARN-2964:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6736 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6736/])
YARN-2964. FSLeafQueue#assignContainer - document the reason for using both 
write and read locks. (Tsuyoshi Ozawa via kasha) (kasha: rev 
f2d150ea1205b77a75c347ace667b4cd060aaf40)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSLeafQueue.java


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2868) Add metric for initial container launch time

2014-12-17 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250541#comment-14250541
 ] 

Anubhav Dhoot commented on YARN-2868:
-

Minor nits: 
FSQueueMetrics has some unused imports that were added
The formatting for @Metric can be made in one line similar to others 

Otherwise LGTM.

> Add metric for initial container launch time
> 
>
> Key: YARN-2868
> URL: https://issues.apache.org/jira/browse/YARN-2868
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ray Chiang
>Assignee: Ray Chiang
>  Labels: metrics, supportability
> Attachments: YARN-2868-01.patch, YARN-2868.002.patch, 
> YARN-2868.003.patch, YARN-2868.004.patch
>
>
> Add a metric to measure the latency between "starting container allocation" 
> and "first container actually allocated".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2975:
---
Attachment: yarn-2975-1.patch

Patch that removes methods that return the lists as is and replaces their use 
with other (new) methods. 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Chris Trezzo (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated YARN-2203:
---
Attachment: YARN-2203-trunk-v5.patch

[~kasha] V5 attached.

Added annotations to classes and security TODO. Tested manually on 
pseudo-distributed cluster.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-17 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-2932:
-
Attachment: YARN-2932.v1.txt

> Add entry for preemption setting to queue status screen and startup/refresh 
> logging
> ---
>
> Key: YARN-2932
> URL: https://issues.apache.org/jira/browse/YARN-2932
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-2932.v1.txt
>
>
> YARN-2056 enables the ability to turn preemption on or off on a per-queue 
> level. This JIRA will provide the preemption status for each queue in the 
> {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
> refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250656#comment-14250656
 ] 

Karthik Kambatla commented on YARN-2962:


[~rakeshr] - you are right, we saw the second case. 

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name

2014-12-17 Thread Hitesh Shah (JIRA)

Hitesh Shah created YARN-2976:
-

 Summary: Invalid docs for specifying 
yarn.nodemanager.docker-container-executor.exec-name
 Key: YARN-2976
 URL: https://issues.apache.org/jira/browse/YARN-2976
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Hitesh Shah
Priority: Minor


Docs on 
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html
 mention setting "docker -H=tcp://0.0.0.0:4243" for 
yarn.nodemanager.docker-container-executor.exec-name. 

However, the actual implementation does a fileExists for the specified value. 

Either the docs need to be fixed or the impl changed to allow relative paths or 
commands with additional args



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250672#comment-14250672
 ] 

Karthik Kambatla commented on YARN-2203:


bq. Also, do we want unit tests to ensure the added fields are all present in 
future versions?
Sorry for this super-cryptic comment that took me forever to decipher. I was 
referring to the Web UI - it might be nice to add a unit test to verify it has 
the expected elements - Cache hits, cache misses etc. 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250691#comment-14250691
 ] 

Jian He commented on YARN-2964:
---

bq. Once the token was stashed in the set, subsequent attempts from sub-jobs to 
store the token would silently be ignored because the token was already in the 
set.
After digging into the code, I found even if we are not canceling the token if 
the flag is set, we still remove the token from the global set. This means that 
if sub-jobs doesn't set the flag, it'll be added to the global set again and 
once the sub-job finishes the token is canceled. I'm wondering how this worked 
before, [~jlowe], [~daryn] could you shed some light on this ?

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250697#comment-14250697
 ] 

Ray Chiang commented on YARN-2975:
--

For removeNonRunnableApp(), isRunnableApp(), isNonRunnableApp(), shouldn't the 
return value be in the finally block instead of the try block?

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250699#comment-14250699
 ] 

Ray Chiang commented on YARN-2975:
--

Same with getNumRunnableApps(), getNumNonRunnableApps(), 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250707#comment-14250707
 ] 

Karthik Kambatla commented on YARN-2975:


It shouldn't matter either way. I would prefer to leave it as is, or return 
after the finally block. 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250758#comment-14250758
 ] 

Chris Trezzo commented on YARN-2944:


Not sure why HadoopQA -1'd it overall, but the patch should be good to go.

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250771#comment-14250771
 ] 

Hadoop QA commented on YARN-2975:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687832/yarn-2975-1.patch
  against trunk revision f2d150e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6133//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6133//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6133//console

This message is automatically generated.

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250773#comment-14250773
 ] 

Hadoop QA commented on YARN-2203:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687835/YARN-2203-trunk-v5.patch
  against trunk revision f2d150e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestApplicationACLs
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6134//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6134//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6134//console

This message is automatically generated.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250788#comment-14250788
 ] 

Hadoop QA commented on YARN-2932:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687837/YARN-2932.v1.txt
  against trunk revision f2d150e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.TestMoveApplication

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6135//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6135//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6135//console

This message is automatically generated.

> Add entry for preemption setting to queue status screen and startup/refresh 
> logging
> ---
>
> Key: YARN-2932
> URL: https://issues.apache.org/jira/browse/YARN-2932
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-2932.v1.txt
>
>
> YARN-2056 enables the ability to turn preemption on or off on a per-queue 
> level. This JIRA will provide the preemption status for each queue in the 
> {{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
> refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250792#comment-14250792
 ] 

Karthik Kambatla commented on YARN-2975:


None of the findbugs warnings are related to this patch. 

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250802#comment-14250802
 ] 

Jason Lowe commented on YARN-2964:
--

IIUC it worked in the past because typically the Oozie launcher job hangs 
around waiting for all the sub-jobs to complete (e.g.: launcher is running a 
pig client).  Since the launcher job was the first to request the token, it's 
the one that remains in the set.  Any attempt to add the token by a sub-job 
will not actually add it because of the way the hashcode and equals methods on 
DelegationTokenToRenew work.  Therefore when a sub-job completes and it tries 
to remove the tokens, this token will not match because the app ID is for the 
launcher and nto the sub-job.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250812#comment-14250812
 ] 

Jian He commented on YARN-2964:
---

I see, I missed the part that launcher job will wait for sub-jobs to complete, 
thanks for your explanation !

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250819#comment-14250819
 ] 

Karthik Kambatla commented on YARN-2964:


IIRC, the launcher job waits for all actions but the MR action. As an 
optimization, Oozie started exiting the launcher for pure MR actions. 
[~rkanter]? 

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2944) SCMStore/InMemorySCMStore is not currently compatible with ReflectionUtils#newInstance

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250847#comment-14250847
 ] 

Karthik Kambatla commented on YARN-2944:


+1. Checking this in. 

> SCMStore/InMemorySCMStore is not currently compatible with 
> ReflectionUtils#newInstance
> --
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-17 Thread Karthik Kambatla (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2944:
---
Summary: InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance  (was: SCMStore/InMemorySCMStore is not currently 
compatible with ReflectionUtils#newInstance)

> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> {noformat}
> This JIRA is to add a 0-argument constructor to SCMStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-12-17 Thread Matteo Mazzucchelli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250851#comment-14250851
 ] 

Matteo Mazzucchelli commented on YARN-2664:
---

Hi [~adhoot], thanks for your attention. If you need, I can help you in 
understanding what the patch does, or I can do some additional tests, if you 
think this could be useful for you.

> Improve RM webapp to expose info about reservations.
> 
>
> Key: YARN-2664
> URL: https://issues.apache.org/jira/browse/YARN-2664
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Matteo Mazzucchelli
> Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
> YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
> YARN-2664.6.patch, YARN-2664.7.patch, YARN-2664.patch, legal.patch, 
> screenshot_reservation_UI.pdf
>
>
> YARN-1051 provides a new functionality in the RM to ask for reservation on 
> resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250872#comment-14250872
 ] 

Chris Trezzo commented on YARN-2203:


The find bugs and test failures are unrelated.

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-17 Thread Craig Welch (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-2637:
--
Attachment: YARN-2637.18.patch

Update with some changes based on [~leftnoteasy] 's comments.

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250881#comment-14250881
 ] 

Junping Du commented on YARN-2148:
--

I still observed the test failure on slow machines for this case.  I think it 
could due to the slow start of CLEANUP_CONTAINER on this testbed, so I agree 
with comments from [~zjshen] that exit code 0 is still possible.
I will file a separate JIRA and deliver a quick patch to fix it (and the test 
which should include more informative messages).


> TestNMClient failed due more exit code values added and passed to AM
> 
>
> Key: YARN-2148
> URL: https://issues.apache.org/jira/browse/YARN-2148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.5.0
>
> Attachments: YARN-2148.patch
>
>
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>   testGetContainerStatus(container, i, ContainerState.COMPLETE,
>   "Container killed by the ApplicationMaster.", Arrays.asList(
>   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>   if (container.hasDefaultExitCode()) {
> container.exitCode = exitEvent.getExitCode();
>   }
> {code}
> We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250880#comment-14250880
 ] 

Karthik Kambatla commented on YARN-2203:


Just discussed with Chris offline. Instead of testing the html output, it might 
make sense to add REST API and test that. 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250883#comment-14250883
 ] 

Karthik Kambatla commented on YARN-2203:


+1. Checking this in as well. 

> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2944) InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250888#comment-14250888
 ] 

Hudson commented on YARN-2944:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6741 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6741/])
YARN-2944. InMemorySCMStore can not be instantiated with 
ReflectionUtils#newInstance. (Chris Trezzo via kasha) (kasha: rev 
a1bd1409649da96c9fde4a9f9398d7711bc6c281)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/TestInMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestClientSCMProtocolService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/DummyAppChecker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/InMemorySCMStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/TestSharedCacheUploaderService.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/test/java/org/apache/hadoop/yarn/server/sharedcachemanager/store/SCMStoreBaseTest.java


> InMemorySCMStore can not be instantiated with ReflectionUtils#newInstance
> -
>
> Key: YARN-2944
> URL: https://issues.apache.org/jira/browse/YARN-2944
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
> Fix For: 2.7.0
>
> Attachments: YARN-2944-trunk-v1.patch, YARN-2944-trunk-v2.patch, 
> YARN-2944-trunk-v3.patch
>
>
> Currently the Shared Cache Manager uses ReflectionUtils#newInstance to create 
> the SCMStore service. Unfortunately the SCMStore class does not have a 
> 0-argument constructor.
> On startup, the SCM fails with the following:
> {noformat}
> 14/12/09 16:10:53 INFO service.AbstractService: Service SharedCacheManager 
> failed in state INITED; cause: java.lang.RuntimeException: 
> java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at java.lang.Class.getConstructor0(Class.java:2763)
> at java.lang.Class.getDeclaredConstructor(Class.java:2021)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)
> ... 4 more
> 14/12/09 16:10:53 FATAL sharedcachemanager.SharedCacheManager: Error starting 
> SharedCacheManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.createSCMStoreService(SharedCacheManager.java:103)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.serviceInit(SharedCacheManager.java:65)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.sharedcachemanager.SharedCacheManager.main(SharedCacheManager.java:156)
> Caused by: java.lang.NoSuchMethodException: 
> org

[jira] [Created] (YARN-2977) TestNMClient get failed intermittently

2014-12-17 Thread Junping Du (JIRA)

Junping Du created YARN-2977:


 Summary: TestNMClient get failed intermittently 
 Key: YARN-2977
 URL: https://issues.apache.org/jira/browse/YARN-2977
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du


There are still some test failures for TestNMClient in slow testbed. Like my 
comments in YARN-2148, the container could be finished before CLEANUP_CONTAINER 
happens due to slow start. Let's add back exit code 0 and add more message for 
test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250889#comment-14250889
 ] 

Junping Du commented on YARN-2148:
--

Filed YARN-2977 to fix this problem.

> TestNMClient failed due more exit code values added and passed to AM
> 
>
> Key: YARN-2148
> URL: https://issues.apache.org/jira/browse/YARN-2148
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0, 2.5.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.5.0
>
> Attachments: YARN-2148.patch
>
>
> Currently, TestNMClient will be failed in trunk, see 
> https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
> {code}
> Test cases in TestNMClient uses following code to verify exit code of 
> COMPLETED containers
> {code}
>   testGetContainerStatus(container, i, ContainerState.COMPLETE,
>   "Container killed by the ApplicationMaster.", Arrays.asList(
>   new Integer[] {137, 143, 0}));
> {code}
> But YARN-2091 added logic to make exit code reflecting the actual status, so 
> exit code of the "killed by ApplicationMaster" will be -105,
> {code}
>   if (container.hasDefaultExitCode()) {
> container.exitCode = exitEvent.getExitCode();
>   }
> {code}
> We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2977) TestNMClient get failed intermittently

2014-12-17 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2977:
-
Attachment: YARN-2977.patch

Deliver a quick patch to fix the test failure.

> TestNMClient get failed intermittently 
> ---
>
> Key: YARN-2977
> URL: https://issues.apache.org/jira/browse/YARN-2977
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2977.patch
>
>
> There are still some test failures for TestNMClient in slow testbed. Like my 
> comments in YARN-2148, the container could be finished before 
> CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
> add more message for test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2977) TestNMClient get failed intermittently

2014-12-17 Thread Junping Du (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2977:
-
Description: 
There are still some test failures for TestNMClient in slow testbed. Like my 
comments in YARN-2148, the container could be finished before CLEANUP_CONTAINER 
happens due to slow start. Let's add back exit code 0 and add more message for 
test case.
The failure stack:
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
at 
org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)

  was:There are still some test failures for TestNMClient in slow testbed. Like 
my comments in YARN-2148, the container could be finished before 
CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and add 
more message for test case.


> TestNMClient get failed intermittently 
> ---
>
> Key: YARN-2977
> URL: https://issues.apache.org/jira/browse/YARN-2977
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2977.patch
>
>
> There are still some test failures for TestNMClient in slow testbed. Like my 
> comments in YARN-2148, the container could be finished before 
> CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
> add more message for test case.
> The failure stack:
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250901#comment-14250901
 ] 

Hudson commented on YARN-1492:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6742 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6742/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
>Priority: Critical
> Attachments: YARN-1492-all-trunk-v1.patch, 
> YARN-1492-all-trunk-v2.patch, YARN-1492-all-trunk-v3.patch, 
> YARN-1492-all-trunk-v4.patch, YARN-1492-all-trunk-v5.patch, 
> shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2203) Web UI for cache manager

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250902#comment-14250902
 ] 

Hudson commented on YARN-2203:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6742 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6742/])
YARN-2203. [YARN-1492] Web UI for cache manager. (Chris Trezzo via kasha) 
(kasha: rev b7f64823e11f745783607ae5f3f97b5e8e64c389)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMWebServer.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMOverviewPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/SharedCacheManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-sharedcachemanager/src/main/java/org/apache/hadoop/yarn/server/sharedcachemanager/webapp/SCMController.java


> Web UI for cache manager
> 
>
> Key: YARN-2203
> URL: https://issues.apache.org/jira/browse/YARN-2203
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
> Fix For: 2.7.0
>
> Attachments: SCMUI-trunk-v4.png, YARN-2203-trunk-v1.patch, 
> YARN-2203-trunk-v2.patch, YARN-2203-trunk-v3.patch, YARN-2203-trunk-v4.patch, 
> YARN-2203-trunk-v5.patch
>
>
> Implement the web server and web ui for the cache manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250903#comment-14250903
 ] 

Ray Chiang commented on YARN-2975:
--

+1 (non-binding).  Code looks fine to me.  Both unit tests run fine in my tree.

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-17 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250908#comment-14250908
 ] 

Craig Welch commented on YARN-2637:
---

[~leftnoteasy] updated patch with some changes based on your comments - details 
below:

(first the lesser comments):

bq. 1. The two checks may not be necessary, they will never be null

So, yes and no.  When running "for real", no, never will be.  We have a 
multitude of mocking cases for tests, however, and at times they were.  I put 
these in to make the exceptions easier to understand in those cases.  As I had 
(previously) tracked them all down, I'll go ahead and remove these as you 
suggest, though I have mixed feelings about that, due to the fact that they may 
cause confusion for a developer down the road...

bq. 2. FiCaSchedulerApp constructor

So, I have left this in - there are a plethora of different places/ways in 
which these are being mocked in tests, and without this it's necessary to make 
a great many rather intricate changes to the test mocking.  If no value is 
provided when running in the "real" case during submission this is the default 
anyway, it did not seem dangerous to propagate that here for the test cases 
which do not travel that path and therefore are subject to npe's

bq. MockRM: Why this is needed? Is there any issue of original default value?

Tests were depending on the previous, incorrect behavior to run - the actual 
size of the AM's vs the cluster size was such that many tests will fail as 
their applications will not start (they are, effectively, "very small 
clusters").  We have targeted tests specific to the max am logic where this is 
not in play - for other cases I want to make sure it is "out of the way" so 
that they can concentrate on what they are testing, hence the change in value.

bq. TestApplicationLimits - Can you add a test for accumulated AM resource 
checking? Like app1.am + app2.am < limit but app1.am + app2.am + app3.am > 
limit. 

Yes, I think that's a good test to add - done

-re maximumActiveApplications - this is a good question.  Before this change it 
was possible to effectively set this value by just doing a bit of math because 
the "pretend" AM size was a fixed value.  Now that the real AM size is being 
used instead, and it can vary, it's no longer possible to effectively set a 
"maxActiveApplicaitons" using the amresourcelimit.  When interacting with some 
folks who were doing system testing, and while going through the unit tests, I 
found that people were, in some cases, expecting to be able to do that/ 
depending on it.  We also had some unit tests which had the same expectation.  
Based on these existing cases I was concerned that, without this, we would be 
taking away a feature that I know is being used.  I think of it is a "backward 
compatibility" requirement and I do think we need it / it has practical value.  
I've not seen maxActiveApplications per user being used in this way, and it 
would be more difficult to do that anyway, so I did not add that ability (I'm 
of the same opinion that it's better not to add something where there is not a 
clear need for it.)

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
>

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250926#comment-14250926
 ] 

Robert Kanter commented on YARN-2964:
-

[~kasha] is correct.  The launcher job waits around for all actions types that 
typically submit other MR jobs (Pig, Sqoop, Hive, etc) except for the MapReduce 
action, which finishes immediately after submitting the "real" MR job.  

I just checked, and in the MR launcher, Oozie sets 
{{mapreduce.job.complete.cancel.delegation.tokens}} to {{true}} and in the 
other launchers, Oozie sets it to {{false}}.  Oozie doesn't set touch this 
property in any "real" launched MR jobs, so they'll use the default, which I'm 
guessing is {{true}}.  Though thinking about this now, it seems like these are 
backwards, so I'm not sure how that's working right

On a related note, we did see an issue recently where a launched job that took 
over 24 hours would cause the launcher to fail with a delegation token issue 
because the token expired; even with the property explicitly set correctly.  
The problem was that {{yarn.resourcemanager.delegation.token.renew-interval}} 
was set to 24 hours (the default) and if you don't renew (or use?) a delegation 
token at least every 24 hours, then it automatically expires.  [~daryn], 
perhaps in the original issue this was set to 10 minutes?  I haven't had a 
chance to look into this, but the fix for this particular issue would be to 
have the launcher job renew the token at some interval.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250930#comment-14250930
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision 9937eef.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6136//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6136//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6136//console

This message is automatically generated.

> Fix new findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-2939
> URL: https://issues.apache.org/jira/browse/YARN-2939
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Li Lu
>  Labels: findbugs
> Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2975) FSLeafQueue app lists are accessed without required locks

2014-12-17 Thread Anubhav Dhoot (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250941#comment-14250941
 ] 

Anubhav Dhoot commented on YARN-2975:
-

The removeApp from one operation is now 2 steps.  Just to confirm my 
understanding, are we relying on the locking in FairScheduler's calling methods 
(removeApplicationAttempt etc) to ensure consistency
boolean wasRunnable = queue.isRunnableApp(attempt);
queue.removeApp(attempt);

Nit: 
suggest resetPreemptedResources -> resetPreemptedResourcesRunnableApps
clearPreemptedResources -> clear PreemptedResourcesRunnableApps

> FSLeafQueue app lists are accessed without required locks
> -
>
> Key: YARN-2975
> URL: https://issues.apache.org/jira/browse/YARN-2975
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-2975-1.patch
>
>
> YARN-2910 adds explicit locked access to runnable and non-runnable apps in 
> FSLeafQueue. As FSLeafQueue has getters for these, they can be accessed 
> without locks in other places. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250944#comment-14250944
 ] 

Hadoop QA commented on YARN-2977:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687881/YARN-2977.patch
  against trunk revision b7f6482.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6138//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6138//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-client.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6138//console

This message is automatically generated.

> TestNMClient get failed intermittently 
> ---
>
> Key: YARN-2977
> URL: https://issues.apache.org/jira/browse/YARN-2977
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2977.patch
>
>
> There are still some test failures for TestNMClient in slow testbed. Like my 
> comments in YARN-2148, the container could be finished before 
> CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
> add more message for test case.
> The failure stack:
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2977) TestNMClient get failed intermittently

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250954#comment-14250954
 ] 

Junping Du commented on YARN-2977:
--

Findbugs warnings are not related, YARN-2940 is filed to address this.

> TestNMClient get failed intermittently 
> ---
>
> Key: YARN-2977
> URL: https://issues.apache.org/jira/browse/YARN-2977
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: YARN-2977.patch
>
>
> There are still some test failures for TestNMClient in slow testbed. Like my 
> comments in YARN-2148, the container could be finished before 
> CLEANUP_CONTAINER happens due to slow start. Let's add back exit code 0 and 
> add more message for test case.
> The failure stack:
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:386)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:348)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:227)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250958#comment-14250958
 ] 

Jian He commented on YARN-2964:
---

bq. we did see an issue recently where a launched job that took over 24 hours 
would cause the launcher to fail with a delegation token issue because the 
token expired;
This is because the token is removed from RM DelegationTokenRenewer even though 
the flag is set to false. Hence, RM won't renew the token. This will cause ooze 
job to fail after 24 hrs, which should be an existing issue. I'm working on a 
patch to fix this no worse than before. The patch is based on the assumption 
that launcher job waits for all actions to complete. 

In addition, I think it may make sense for oozie to propagate  this flag to 
other actions also.  Or we can take another solution to have an application 
group Id to indicate a group of applications like oozie case and tie the token 
lifetime with the group, and drop this flag completely. 


> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-17 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250965#comment-14250965
 ] 

Junping Du commented on YARN-2972:
--

I have commit it to trunk and branch-2. Thanks [~jlowe] for contributing a 
patch!

> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Robert Kanter (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250968#comment-14250968
 ] 

Robert Kanter commented on YARN-2964:
-

+1 to the idea of groups.  canceling/not canceling the token the way we do now 
seems kinda hacky.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2972) DelegationTokenRenewer thread pool never expands

2014-12-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250977#comment-14250977
 ] 

Hudson commented on YARN-2972:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6744 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6744/])
YARN-2972. DelegationTokenRenewer thread pool never expands. Contributed by 
Jason Lowe (junping_du: rev 2b4b0e8847048850206f091c6870a02e08cfe836)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java
* hadoop-yarn-project/CHANGES.txt


> DelegationTokenRenewer thread pool never expands
> 
>
> Key: YARN-2972
> URL: https://issues.apache.org/jira/browse/YARN-2972
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.3.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
> Fix For: 2.7.0
>
> Attachments: YARN-2972.001.patch
>
>
> DelegationTokenRenewer uses a thread pool to manage token renewals.  The 
> number of threads is configurable, but unfortunately the pool never expands 
> beyond the hardcoded initial 5 threads because we are using an unbounded 
> LinkedBlockingQueue.  ThreadPoolExecutor only grows the thread pool beyond 
> the core size when the specified queue is full.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is > minimumAllocation

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250987#comment-14250987
 ] 

Hadoop QA commented on YARN-2637:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687880/YARN-2637.18.patch
  against trunk revision a1bd140.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6137//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6137//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6137//console

This message is automatically generated.

> maximum-am-resource-percent could be violated when resource of AM is > 
> minimumAllocation
> 
>
> Key: YARN-2637
> URL: https://issues.apache.org/jira/browse/YARN-2637
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Wangda Tan
>Assignee: Craig Welch
>Priority: Critical
> Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
> YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.15.patch, 
> YARN-2637.16.patch, YARN-2637.17.patch, YARN-2637.18.patch, 
> YARN-2637.2.patch, YARN-2637.6.patch, YARN-2637.7.patch, YARN-2637.9.patch
>
>
> Currently, number of AM in leaf queue will be calculated in following way:
> {code}
> max_am_resource = queue_max_capacity * maximum_am_resource_percent
> #max_am_number = max_am_resource / minimum_allocation
> #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
> {code}
> And when submit new application to RM, it will check if an app can be 
> activated in following way:
> {code}
> for (Iterator i=pendingApplications.iterator(); 
>  i.hasNext(); ) {
>   FiCaSchedulerApp application = i.next();
>   
>   // Check queue limit
>   if (getNumActiveApplications() >= getMaximumActiveApplications()) {
> break;
>   }
>   
>   // Check user limit
>   User user = getUser(application.getUser());
>   if (user.getActiveApplications() < 
> getMaximumActiveApplicationsPerUser()) {
> user.activateApplication();
> activeApplications.add(application);
> i.remove();
> LOG.info("Application " + application.getApplicationId() +
> " from user: " + application.getUser() + 
> " activated in queue: " + getQueueName());
>   }
> }
> {code}
> An example is,
> If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
> resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
> launched is 200, and if user uses 5M for each AM (> minimum_allocation). All 
> apps can still be activated, and it will occupy all resource of a queue 
> instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2964:
--
Attachment: YARN-2964.1.patch

uploaded a patch:
- the patch adds a new map which keeps track of all the tokens. If the token is 
already present, it'll not add a new DelegationTokenToRenew instance for that 
token.
- add a conditional check in requestNewHdfsDelegationToken method (missed this 
in YARN-2704)

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251107#comment-14251107
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6139//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6139//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6139//console

This message is automatically generated.

> Fix new findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-2939
> URL: https://issues.apache.org/jira/browse/YARN-2939
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Li Lu
>  Labels: findbugs
> Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2964) RM prematurely cancels tokens for jobs that submit jobs (oozie)

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1425#comment-1425
 ] 

Hadoop QA commented on YARN-2964:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12687918/YARN-2964.1.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestRMAdminService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6140//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6140//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6140//console

This message is automatically generated.

> RM prematurely cancels tokens for jobs that submit jobs (oozie)
> ---
>
> Key: YARN-2964
> URL: https://issues.apache.org/jira/browse/YARN-2964
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Jian He
>Priority: Blocker
> Attachments: YARN-2964.1.patch
>
>
> The RM used to globally track the unique set of tokens for all apps.  It 
> remembered the first job that was submitted with the token.  The first job 
> controlled the cancellation of the token.  This prevented completion of 
> sub-jobs from canceling tokens used by the main job.
> As of YARN-2704, the RM now tracks tokens on a per-app basis.  There is no 
> notion of the first/main job.  This results in sub-jobs canceling tokens and 
> failing the main job and other sub-jobs.  It also appears to schedule 
> multiple redundant renewals.
> The issue is not immediately obvious because the RM will cancel tokens ~10 
> min (NM livelyness interval) after log aggregation completes.  The result is 
> an oozie job, ex. pig, that will launch many sub-jobs over time will fail if 
> any sub-jobs are launched >10 min after any sub-job completes.  If all other 
> sub-jobs complete within that 10 min window, then the issue goes unnoticed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2973) Capacity scheduler configuration ACLs not work.

2014-12-17 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251171#comment-14251171
 ] 

Naganarasimha G R commented on YARN-2973:
-

Hi [~rootsongjc] or [~rohithsharma]
Can you set the title and description as per the latest discussions ... 
"Capacity scheduler configuration ACLs for children queue/sub queues will not 
work if the root queue's default acl configurations are not modified"

> Capacity scheduler configuration ACLs not work.
> ---
>
> Key: YARN-2973
> URL: https://issues.apache.org/jira/browse/YARN-2973
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 2.5.0
> Environment: ubuntu 12.04, cloudera manager, cdh5.2.1
>Reporter: Jimmy Song
>Assignee: Rohith
>  Labels: acl, capacity-scheduler, yarn
>
> I follow this page to configure yarn: 
> http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html.
>  
> I configured YARN to use capacity scheduler in yarn-site.xml with 
> yarn.resourcemanager.scheduler.class for 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.
>  Then modified capacity-scheduler.xml,
> ___
> 
> 
>   
> yarn.scheduler.capacity.root.queues
> default,extract,report,tool
>   
>   
> yarn.scheduler.capacity.root.state
> RUNNING
>   
>   
> yarn.scheduler.capacity.root.default.acl_submit_applications
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.acl_administer_queue
> jcsong2, y2 
>   
>   
> yarn.scheduler.capacity.root.default.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.extract.acl_submit_applications
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.acl_administer_queue
> jcsong2 
>   
>   
> yarn.scheduler.capacity.root.extract.capacity
> 15
>   
>   
> yarn.scheduler.capacity.root.report.acl_submit_applications
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.acl_administer_queue
> y2 
>   
>   
> yarn.scheduler.capacity.root.report.capacity
> 35
>   
>   
> yarn.scheduler.capacity.root.tool.acl_submit_applications
>  
>   
>   
> yarn.scheduler.capacity.root.tool.acl_administer_queue
>  
>   
>   
> yarn.scheduler.capacity.root.tool.capacity
> 15
>   
> 
> ___
> I have enabled the acl in yarn-site.xml, but the user jcsong2 can submit 
> applications to every queue. The queue acl does't work! And the queue used 
> capacity more than it was configured! 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2939) Fix new findbugs warnings in hadoop-yarn-common

2014-12-17 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251234#comment-14251234
 ] 

Hadoop QA commented on YARN-2939:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12687590/YARN-2939-121614.patch
  against trunk revision 1050d42.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 14 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6141//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6141//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6141//console

This message is automatically generated.

> Fix new findbugs warnings in hadoop-yarn-common
> ---
>
> Key: YARN-2939
> URL: https://issues.apache.org/jira/browse/YARN-2939
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Li Lu
>  Labels: findbugs
> Attachments: YARN-2939-120914.patch, YARN-2939-121614.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2962) ZKRMStateStore: Limit the number of znodes under a znode

2014-12-17 Thread Rakesh R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14251292#comment-14251292
 ] 

Rakesh R commented on YARN-2962:


OK. Recently I had given a talk about ZooKeeper. Please refer [ZooKeeper In The 
Wild|http://events.linuxfoundation.org/sites/events/files/slides/ZooKeeper%20in%20the%20Wild.pdf]
 presentation slides. In that I had mentioned similar case. Probably you guys 
can have a look at the slides and page no. 30. Here the idea is, instead of 
flat structure use hierarchical structure. For this user need to split the 
single znode name to form a hierarchy. With this approach user can store many 
znodes. AFAIK this is proven method in [Apache 
BookKeeper|http://zookeeper.apache.org/bookkeeper/docs/r4.3.0] project.

For example, {{application_1418470446447_0049}} can split to form a hierarchy 
like,  {{(app_root)\application_\141\84704\46447_\0049}}. If there is a data 
for this application, user can store it to the leaf znode. Since am not very 
good in yarn, you guys can find a better way to split the znode name for 
holding n number of znodes. Provide a parser to read it back by iterating the 
child znodes and form application_1418470446447_0049. Since ZooKeeper read 
operation is low latency one, it wont hit performance I guess. Probably we can 
do a test and see the performance graph.

-Rakesh

> ZKRMStateStore: Limit the number of znodes under a znode
> 
>
> Key: YARN-2962
> URL: https://issues.apache.org/jira/browse/YARN-2962
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Varun Saxena
>Priority: Critical
>
> We ran into this issue where we were hitting the default ZK server message 
> size configs, primarily because the message had too many znodes even though 
> they individually they were all small.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-17 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-2933:
---

Assignee: Mayank Bansal  (was: Wangda Tan)

Taking it over

> Capacity Scheduler preemption policy should only consider capacity without 
> labels temporarily
> -
>
> Key: YARN-2933
> URL: https://issues.apache.org/jira/browse/YARN-2933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
>
> Currently, we have capacity enforcement on each queue for each label in 
> CapacityScheduler, but we don't have preemption policy to support that. 
> YARN-2498 is targeting to support preemption respect node labels, but we have 
> some gaps in code base, like queues/FiCaScheduler should be able to get 
> usedResource/pendingResource, etc. by label. These items potentially need to 
> refactor CS which we need spend some time carefully think about.
> For now, what immediately we can do is allow calculate ideal_allocation and 
> preempt containers only for resources on nodes without labels, to avoid 
> regression like: A cluster has some nodes with labels and some not, assume 
> queueA isn't satisfied for resource without label, but for now, preemption 
> policy may preempt resource from nodes with labels for queueA, that is not 
> correct.
> Again, it is just a short-term enhancement, YARN-2498 will consider 
> preemption respecting node-labels for Capacity Scheduler which is our final 
> target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2933) Capacity Scheduler preemption policy should only consider capacity without labels temporarily

2014-12-17 Thread Mayank Bansal (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-2933:

Attachment: YARN-2933-1.patch

Attaching patch for avoiding preemption for labeled containers.

Thanks,
Mayank

> Capacity Scheduler preemption policy should only consider capacity without 
> labels temporarily
> -
>
> Key: YARN-2933
> URL: https://issues.apache.org/jira/browse/YARN-2933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Mayank Bansal
> Attachments: YARN-2933-1.patch
>
>
> Currently, we have capacity enforcement on each queue for each label in 
> CapacityScheduler, but we don't have preemption policy to support that. 
> YARN-2498 is targeting to support preemption respect node labels, but we have 
> some gaps in code base, like queues/FiCaScheduler should be able to get 
> usedResource/pendingResource, etc. by label. These items potentially need to 
> refactor CS which we need spend some time carefully think about.
> For now, what immediately we can do is allow calculate ideal_allocation and 
> preempt containers only for resources on nodes without labels, to avoid 
> regression like: A cluster has some nodes with labels and some not, assume 
> queueA isn't satisfied for resource without label, but for now, preemption 
> policy may preempt resource from nodes with labels for queueA, that is not 
> correct.
> Again, it is just a short-term enhancement, YARN-2498 will consider 
> preemption respecting node-labels for Capacity Scheduler which is our final 
> target. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 101 matches

Mail list logo