[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718120#comment-14718120
 ] 

Varun Saxena commented on YARN-4074:


bq. So improving the API would be taken up by Varun. Varun?
Yes

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-27 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718117#comment-14718117
 ] 

Sunil G commented on YARN-3970:
---

Hi Naga.
Approach sounds good. Existing app reports will provide the running
priority. And it's also good to verify whether app is in accepted state or
running state before invoking scheduler api to change priority.

On Fri, Aug 28, 2015, 11:43 AM Naganarasimha G R (JIRA) 



> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3970) REST api support for Application Priority

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718109#comment-14718109
 ] 

Naganarasimha G R commented on YARN-3970:
-

Thanks [~rohithsharma], 
As per the offline discussion we had, as CLI has not yet been handled for 
displaying "default priority of a queue" & "get cluster max priority", focusing 
only on set (more like update)Priority of an application through REST 
P.S. get is already handled as part of Applicationreport from getApps.
REST URL which i am planning for updating priority of the application is : 
{{"/ws/v1/cluster/apps/\{appid\}/priority"}} 
And planning to invoke {{rm.getClientRMService().updateApplicationPriority()}} 
to achieve the App priority update.

> REST api support for Application Priority
> -
>
> Key: YARN-3970
> URL: https://issues.apache.org/jira/browse/YARN-3970
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Naganarasimha G R
>
> REST api support for application priority.
> - get/set priority of an application
> - get default priority of a queue
> - get cluster max priority



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2890:
--
   Labels: 2.6.1-candidate 2.7.2-candidate  (was: 2.6.1-candidate)
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestJobHistoryEventHandler, 
TestMRTimelineEventHandling and TestDistributedShell before the push. Patch 
applied cleanly.

> MiniYarnCluster should turn on timeline service if configured to do so
> --
>
> Key: YARN-2890
> URL: https://issues.apache.org/jira/browse/YARN-2890
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Mit Desai
>Assignee: Mit Desai
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0
>
> Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, 
> YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, 
> YARN-2890.patch, YARN-2890.patch
>
>
> Currently the MiniMRYarnCluster does not consider the configuration value for 
> enabling timeline service before starting. The MiniYarnCluster should only 
> start the timeline service if it is configured to do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2905) AggregatedLogsBlock page can infinitely loop if the aggregated log file is corrupted

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2905:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.

> AggregatedLogsBlock page can infinitely loop if the aggregated log file is 
> corrupted
> 
>
> Key: YARN-2905
> URL: https://issues.apache.org/jira/browse/YARN-2905
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.5.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2905.patch
>
>
> If the AggregatedLogsBlock page tries to serve up a portion of a log file 
> that has been corrupted (e.g.: like the case that was fixed by YARN-2724) 
> then it can spin forever trying to seek to the targeted log segment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2906) CapacitySchedulerPage shows HTML tags for a queue's Active Users

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2906:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


> CapacitySchedulerPage shows HTML tags for a queue's Active Users
> 
>
> Key: YARN-2906
> URL: https://issues.apache.org/jira/browse/YARN-2906
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2906v1.patch
>
>
> On the capacity scheduler web page, expanding the details of a queue shows 
> HTML tags among the details for the active users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717907#comment-14717907
 ] 

Bikas Saha commented on YARN-4088:
--

Right. So the combined objective is to continue to have small heartbeat 
intervals with larger clusters while still using the central scheduler for all 
allocations. Clearly, in theory, that is a bottleneck by design and our attempt 
is to engineer our way out of it for medium size clusters. Right? :)

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717905#comment-14717905
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2243 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2243/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2865) Application recovery continuously fails with "Application with id already present. Cannot duplicate"

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2865:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMHA before the push. Patch 
applied cleanly.

> Application recovery continuously fails with "Application with id already 
> present. Cannot duplicate"
> 
>
> Key: YARN-2865
> URL: https://issues.apache.org/jira/browse/YARN-2865
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2865.1.patch, YARN-2865.patch, YARN-2865.patch
>
>
> YARN-2588 handles exception thrown while transitioningToActive and reset 
> activeServices. But it misses out clearing RMcontext apps/nodes details and 
> ClusterMetrics and QueueMetrics. This causes application recovery to fail.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2414) RM web UI: app page will crash if app is failed before any attempt has been created

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2414:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestAppPage before the push. Patch 
applied cleanly.

> RM web UI: app page will crash if app is failed before any attempt has been 
> created
> ---
>
> Key: YARN-2414
> URL: https://issues.apache.org/jira/browse/YARN-2414
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: webapp
>Reporter: Zhijie Shen
>Assignee: Wangda Tan
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2414.20141104-1.patch, YARN-2414.20141104-2.patch, 
> YARN-2414.patch
>
>
> {code}
> 2014-08-12 16:45:13,573 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/app/application_1407887030038_0001
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:84)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:460)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1191)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpCo

[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717887#comment-14717887
 ] 

Srikanth Kandula commented on YARN-4088:


See the problem with slower heartbeats is that if the tasks are short-running, 
there will be a cluster-wide throughput drop due to the feedback delay. This is 
one of the points that Sparrow (Spark) and Mercury hammer Yarn on... Of course, 
reusing containers *can* help but other ducks have to align well.  In general, 
slowing the heartbeat is not a good thing.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2816) NM fail to start with NPE during container recovery

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2816:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestNMLeveldbStateStoreService 
before the push. Patch applied cleanly.

> NM fail to start with NPE during container recovery
> ---
>
> Key: YARN-2816
> URL: https://issues.apache.org/jira/browse/YARN-2816
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.5.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2816.000.patch, YARN-2816.001.patch, 
> YARN-2816.002.patch, leveldb_records.txt
>
>
> NM fail to start with NPE during container recovery.
> We saw the following crash happen:
> 2014-10-30 22:22:37,211 INFO org.apache.hadoop.service.AbstractService: 
> Service 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
>  failed in state INITED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recoverContainer(ContainerManagerImpl.java:289)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.recover(ContainerManagerImpl.java:252)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:235)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:250)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:445)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:492)
> The reason is some DB files used in NMLeveldbStateStoreService are 
> accidentally deleted to save disk space at 
> /tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state. This leaves some incomplete 
> container record which don't have CONTAINER_REQUEST_KEY_SUFFIX(startRequest) 
> entry in the DB. When container is recovered at 
> ContainerManagerImpl#recoverContainer, 
> The NullPointerException at the following code cause NM shutdown.
> {code}
> StartContainerRequest req = rcs.getStartRequest();
> ContainerLaunchContext launchContext = req.getContainerLaunchContext();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2856) Application recovery throw InvalidStateTransitonException: Invalid event: ATTEMPT_KILLED at ACCEPTED

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2856:
--
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMAppTransitions before the 
push. Patch applied cleanly.

> Application recovery throw InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
> 
>
> Key: YARN-2856
> URL: https://issues.apache.org/jira/browse/YARN-2856
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: YARN-2856.1.patch, YARN-2856.patch
>
>
> It is observed that recovering an application with its attempt KILLED final 
> state throw below exception. And application remain in accepted state forever.
> {code}
> 2014-11-12 02:34:10,602 | ERROR | AsyncDispatcher event handler | Can't 
> handle this event at current state | 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:673)
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> ATTEMPT_KILLED at ACCEPTED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:90)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:730)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:714)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717880#comment-14717880
 ] 

Naganarasimha G R commented on YARN-4091:
-

Seems like goal of YARN-3946 is a subset of this jira

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717865#comment-14717865
 ] 

Sangjin Lee commented on YARN-4053:
---

[~vrushalic], [~jrottinghuis], and I discussed supported types a little more, 
and we're of the opinion that we can *start* supporting only longs for now 
(i.e. no floating point types), while we can consider adding a floating point 
type (namely double) to the list of supported types. So for now, how about 
assuming (and enforcing) long as the type of the metric values, and pursue how 
we can add double later if we need it? Thoughts?

> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4053) Change the way metric values are stored in HBase Storage

2015-08-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717863#comment-14717863
 ] 

Sangjin Lee commented on YARN-4053:
---

Thanks [~varun_saxena] for the discussion. As you said, one thing that really 
causes issues is when inconsistent values are used for the same metric. At a 
high level, I think we need to ask these questions:

- How important is it to support this scenario?
- If we don't really support this scenario, then what is the minimally 
acceptable behavior if that were to happen?

The gist of the problem is that one cannot really write/read consistent values 
without knowing the "right" type of the metric. The user will likely not know 
that either for the write or read path. In the face of this, the main 
difference between approach #1 (encoding it into the value) and approach #2 
(adding it to the column qualifier) is that approach #1 will mix different-type 
values into a single time series (column), and approach #2 will effectively 
create two separate time series (columns). The rest is the fallout.


> Change the way metric values are stored in HBase Storage
> 
>
> Key: YARN-4053
> URL: https://issues.apache.org/jira/browse/YARN-4053
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: YARN-4053-YARN-2928.01.patch
>
>
> Currently HBase implementation uses GenericObjectMapper to convert and store 
> values in backend HBase storage. This converts everything into a string 
> representation(ASCII/UTF-8 encoded byte array).
> While this is fine in most cases, it does not quite serve our use case for 
> metrics. 
> So we need to decide how are we going to encode and decode metric values and 
> store them in HBase.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table

2015-08-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717861#comment-14717861
 ] 

Li Lu commented on YARN-3817:
-

Oh BTW the patch is based on YARN-3816-YARN-2928-v1.patch in YARN-3816. 

> [Aggregation] Flow and User level aggregation on Application States table
> -
>
> Key: YARN-3817
> URL: https://issues.apache.org/jira/browse/YARN-3817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
> Attachments: Detail Design for Flow and User Level Aggregation.pdf, 
> YARN-3817-poc-v1.patch
>
>
> We need time-based flow/user level aggregation to present flow/user related 
> states to end users.
> Flow level represents summary info of a specific flow. User level aggregation 
> represents summary info of a specific user, it should include summary info of 
> accumulated and statistic means (by two levels: application and flow), like: 
> number of Flows, applications, resource consumption, resource means per app 
> or flow, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3817) [Aggregation] Flow and User level aggregation on Application States table

2015-08-27 Thread Li Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-3817:

Attachment: YARN-3817-poc-v1.patch

I'm attaching the first POC patch of our Phoenix based offline aggregator. The 
current patch adds a mapreduce based offline aggregator that will gather 
information from our HBase storage, perform the flow and user based 
aggregation, and writes aggregated data back to Phoenix. Generally, the 
expected input to the offline aggregator is a list of flows (active flow of the 
past time period, or a specially created list of flows within a given time 
window). The offline aggregator will firstly aggregate all flow run data for 
each flow in both the mapper and the reducer, then write them back into 
Phoenix. Meanwhile, the aggregated data is passed alone to the user level 
aggregation. The user level aggregation performs similar aggregations as the 
flow aggregations. There is a TimelineEntityWritable class to transfer 
TimelineEntities. 

Some TODOs:
1. Centralize some of the HBase reader related code for both the aggregation 
hbase reader and the hbase reader. 
2. Create a "trigger" to launch the aggregator in a timely or ad-hoc fashion. 
3. Separate configs. 
4. Support aggregation on a specific time period. 
5. More tests. 

Future TODOs: 
Reorganize our storage package and unit tests

Some extra work performed in this patch:
1. No longer storing info fields in Phoenix writer. 
2. Escaping special characters in Phoenix writer by quoting all column names 
(according to Phoenix team's suggestion). 
3. Centralizing tests for aggregation and Phoenix. 
4. Remove unused TestTimelineWriterUtil. 


> [Aggregation] Flow and User level aggregation on Application States table
> -
>
> Key: YARN-3817
> URL: https://issues.apache.org/jira/browse/YARN-3817
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Li Lu
> Attachments: Detail Design for Flow and User Level Aggregation.pdf, 
> YARN-3817-poc-v1.patch
>
>
> We need time-based flow/user level aggregation to present flow/user related 
> states to end users.
> Flow level represents summary info of a specific flow. User level aggregation 
> represents summary info of a specific user, it should include summary info of 
> accumulated and statistic means (by two levels: application and flow), like: 
> number of Flows, applications, resource consumption, resource means per app 
> or flow, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717855#comment-14717855
 ] 

Sangjin Lee commented on YARN-4074:
---

Thanks [~gtCarrera9] for your comments.

{quote}
As a general question, since we're returning our timeline entities as jsons in 
our web service, we need to some sort "rebuild" those entities on the js client 
side, right? If this is the case, we need to provide some js object model to be 
consistent with our TimelineEntity object model? I'm not a front-end expert so 
I'd like to learn the typical practice on this problem.
{quote}
I'm not intimately familiar with that either. I hope someone who's familiar 
could comment?

I'm going to do some refactoring to move away from the if-else branch (yuck). 
There are aspects such as input validation, getting results from HBase, and 
creating the entity objects that can be isolated more clearly. I need to give 
some more thoughts on how to encapsulate that more clearly. This has some 
bearing on the filter-related work that Varun is doing, so I'll try not to 
touch that area in this JIRA.

One thing I forgot to mention is that the current POC patch is a diff against 
the patch for YARN-3901, to be able to isolate the changes for this JIRA. The 
patch for YARN-3901 needs to be reviewed and committed before this can be. 
That's why this patch is missing what's included in the YARN-3901 patch.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717831#comment-14717831
 ] 

Bikas Saha commented on YARN-4088:
--

Why not on a 3K cluster? We could slowdown heartbeats to (say 10s) on a 3K node 
cluster. That should work though I agree that NM info would be stale for 
longer, if that's your point.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717798#comment-14717798
 ] 

Srikanth Kandula commented on YARN-4088:


Yes, concurrently.   Your suggestion is a good one. In that, it does give the 
RM more time to be clever on small clusters. But, no such luck on say a 3K 
server cluster. Avoiding serialization may be the answer to most other problems.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717785#comment-14717785
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2262 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2262/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717758#comment-14717758
 ] 

Li Lu commented on YARN-4074:
-

Thank [~sjlee0]! I looked at the current POC patch and have some comments:
# In general, I'm OK with this approach. I think the current FlowEntity design 
should provide sufficient information for the web UI POC. 
# As a general question, since we're returning our timeline entities as jsons 
in our web service, we need to some sort "rebuild" those entities on the js 
client side, right? If this is the case, we need to provide some js object 
model to be consistent with our TimelineEntity object model? I'm not a 
front-end expert  so I'd like to learn the typical practice on this problem. 
# Please make sure, in the final patch, to change timeline schema creator so 
that we're consistent with the list of tables. Maybe we'd like to find some 
better ways to keep all these tables consistent within writer, reader and 
schema creator in future. 
# I agree with all of you guys that we may want to refactor the current 
implementation. For example, we may not want to dispatch incoming timeline 
entity to different tables by a list of if-statements (deciding which table to 
go has already caused me some confusion when working on the offline aggregator 
patch rebase)? Also, the parsing logic can also be easily isolated I believe? 
# Some changes in files like FlowActivityRowKey.java are not included in this 
patch? 


> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4088) RM should be able to process heartbeats from NM asynchronously

2015-08-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717746#comment-14717746
 ] 

Bikas Saha commented on YARN-4088:
--

Is the suggestion to process them in concurrently? Not quite sure what async 
means here? Is it async wrt the RPC thread?
Another alternative would be to dynamically adjust the NM heartbeat interval. 
IIRC, the NM next heartbeat interval is sent by the RM in the response to the 
heartbeat. If not, then this could be added. The RM could potentially increase 
this interval till it reaches a steady/stable state of heartbeat processing. 
This would help in self-adjusting to cluster sizes. Small for small cluster and 
high for high cluster. This could tune up under high load and then tune down 
once load diminishes.

> RM should be able to process heartbeats from NM asynchronously
> --
>
> Key: YARN-4088
> URL: https://issues.apache.org/jira/browse/YARN-4088
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager, scheduler
>Reporter: Srikanth Kandula
>
> Today, the RM sequentially processes one heartbeat after another. 
> Imagine a 3000 server cluster with each server heart-beating every 3s. This 
> gives the RM 1ms on average to process each NM heartbeat. That is tough.
> It is true that there are several underlying datastructures that will be 
> touched during heartbeat processing. So, it is non-trivial to parallelize the 
> NM heartbeat. Yet, it is quite doable...
> Parallelizing the NM heartbeat would substantially improve the scalability of 
> the RM, allowing it to either 
> a) run larger clusters or 
> b) support faster heartbeats or dynamic scaling of heartbeats
> c) take more asks from each application or 
> c) use cleverer/ more expensive algorithms such as node labels or better 
> packing or ...
> Indeed the RM's scalability limit has been cited as the motivating reason for 
> a variety of efforts which will become less needed if this can be solved. 
> Ditto for slow heartbeats.  See Sparrow and Mercury papers for example.
> Can we take a shot at this?
> If not, could we discuss why.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4043) Change logging of warning message : "an attempt to override final parameter:"

2015-08-27 Thread Spandan Dutta (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spandan Dutta updated YARN-4043:

Description: 
In the following 
[function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].
 

When the attr is in the list of final attrs it just outputs this message 
without actually updating any resources as per my understanding. 

We change this to debug logging.

  was:
In the following 
[function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].
 

When the attr is in the list of final attrs it just outputs this message 
without actually updating any resources as per my understanding. 

We should remove this warning.


> Change logging of warning message : "an attempt to override final parameter:"
> -
>
> Key: YARN-4043
> URL: https://issues.apache.org/jira/browse/YARN-4043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Spandan Dutta
>Assignee: Spandan Dutta
> Attachments: warn-msg.patch, warn-msg.patch
>
>
> In the following 
> [function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].
>  
> When the attr is in the list of final attrs it just outputs this message 
> without actually updating any resources as per my understanding. 
> We change this to debug logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4043) Change logging of warning message : "an attempt to override final parameter:"

2015-08-27 Thread Spandan Dutta (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Spandan Dutta updated YARN-4043:

Summary: Change logging of warning message : "an attempt to override final 
parameter:"  (was: Unnecessary warning message : "an attempt to override final 
parameter:")

> Change logging of warning message : "an attempt to override final parameter:"
> -
>
> Key: YARN-4043
> URL: https://issues.apache.org/jira/browse/YARN-4043
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Spandan Dutta
>Assignee: Spandan Dutta
> Attachments: warn-msg.patch, warn-msg.patch
>
>
> In the following 
> [function|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L2739].
>  
> When the attr is in the list of final attrs it just outputs this message 
> without actually updating any resources as per my understanding. 
> We should remove this warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717721#comment-14717721
 ] 

Junping Du commented on YARN-4074:
--

Ok. Have a separated JIRA to track this refactor work should be fine. Thanks 
for pointing to that JIRA. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717681#comment-14717681
 ] 

Li Lu commented on YARN-4074:
-

Hi [~sjlee0], so far the first option looks good to me. The upside of this is 
that it fits our web UI POC requirements fine, and it's relatively clean to 
maintain. The downside is that in order to support some complex use cases, we 
need to make some compositions. For the current stage I think it's fine and we 
can use it to bootstrap our web UI renderers. 

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717666#comment-14717666
 ] 

Hadoop QA commented on YARN-4087:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  18m 23s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 51s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  0s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m 12s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   1m 59s | Tests passed in 
hadoop-yarn-common. |
| | |  46m 22s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752862/YARN-4087.2.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a9c8ea7 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8932/console |


This message was automatically generated.

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch, YARN-4087.2.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717664#comment-14717664
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #305 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/305/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717659#comment-14717659
 ] 

Hadoop QA commented on YARN-3920:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 57s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 10s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 41s | The applied patch generated  
127 new checkstyle issues (total was 0, now 127). |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 33s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 43s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752847/YARN-3920.004.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / a9c8ea7 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/whitespace.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8931/console |


This message was automatically generated.

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, yARN-3920.001.patch, 
> yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-4074:
--
Attachment: YARN-4074-YARN-2928.POC.002.patch

Posting a v.2 POC patch. This adds the flow run query.

As for [~djp]'s comments, yes, I agree that the reader code needs more serious 
refactoring, both in the API as well as the implementation.

I believe [~varun_saxena]'s looking into cleaning up the filters, and so on in 
YARN-3863. So improving the API would be taken up by Varun. Varun?

I'd also like to refactor the implementation more to restructure it. This POC 
patch is by no means an indication of the final form of this patch. I just 
wanted to get it out there so we can ensure it is correct and discuss the 
approach taken here. I hope that clarifies things a bit.

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch, 
> YARN-4074-YARN-2928.POC.002.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717634#comment-14717634
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #1046 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1046/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717639#comment-14717639
 ] 

Hudson commented on YARN-3250:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #318 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/318/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-4087:
--
Attachment: YARN-4087.2.patch

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch, YARN-4087.2.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717568#comment-14717568
 ] 

Jian He commented on YARN-4087:
---

The YARN_FAIL_FAST is a global knob to control all components, e.g. RM, NM; The 
config description does the clarification. Just can't think of a concise and 
meaningful name. Any naming suggestion is welcome.

Update the patch to carify the config description more.

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API

2015-08-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717539#comment-14717539
 ] 

Vinod Kumar Vavilapalli commented on YARN-4009:
---

Cross origin support exists for Timeline Service V1, linking related tickets.

> CORS support for ResourceManager REST API
> -
>
> Key: YARN-4009
> URL: https://issues.apache.org/jira/browse/YARN-4009
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Prakash Ramachandran
>
> Currently the REST API's do not have CORS support. This means any UI (running 
> in browser) cannot consume the REST API's. For ex Tez UI would like to use 
> the REST API for getting application, application attempt information exposed 
> by the API's. 
> It would be very useful if CORS is enabled for the REST API's.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717526#comment-14717526
 ] 

Hitesh Shah commented on YARN-4087:
---

It would be good to rename the config property to something that provides a bit 
more clarity on what the config knob is meant to control. 

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717522#comment-14717522
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #313 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/313/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir

2015-08-27 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717524#comment-14717524
 ] 

Hitesh Shah commented on YARN-4085:
---

Set values in the environment as compared to a file? If a file, should that be 
a properties file with all useful information written into it and not just the 
resource size info? 

> Generate file with container resource limits in the container work dir
> --
>
> Key: YARN-4085
> URL: https://issues.apache.org/jira/browse/YARN-4085
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>Priority: Minor
>
> Currently, a container doesn't know what resource limits are being imposed on 
> it. It would be helpful if the NM generated a simple file in the container 
> work dir with the resource limits specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3920) FairScheduler Reserving a node for a container should be configurable to allow it used only for large containers

2015-08-27 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-3920:

Attachment: YARN-3920.004.patch

> FairScheduler Reserving a node for a container should be configurable to 
> allow it used only for large containers
> 
>
> Key: YARN-3920
> URL: https://issues.apache.org/jira/browse/YARN-3920
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
> Attachments: YARN-3920.004.patch, YARN-3920.004.patch, 
> YARN-3920.004.patch, YARN-3920.004.patch, yARN-3920.001.patch, 
> yARN-3920.002.patch, yARN-3920.003.patch
>
>
> Reserving a node for a container was designed for preventing large containers 
> from starvation from small requests that keep getting into a node. Today we 
> let this be used even for a small container request. This has a huge impact 
> on scheduling since we block other scheduling requests until that reservation 
> is fulfilled. We should make this configurable so its impact can be minimized 
> by limiting it for large container requests as originally intended. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3250) Support admin cli interface in for Application Priority

2015-08-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717480#comment-14717480
 ] 

Hudson commented on YARN-3250:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8359 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8359/])
YARN-3250. Support admin cli interface in for Application Priority. Contributed 
by Rohith Sharma K S (jianhe: rev a9c8ea71aa427ff5f25caec98be15bc880e578a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceManagerAdministrationProtocolPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/ResourceManagerAdministrationProtocol.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/AdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/yarn_server_resourcemanager_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityResponsePBImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/YarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAdminService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestRMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceManagerAdministrationProtocolPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/RefreshClusterMaxPriorityResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/server/resourcemanager_administration_protocol.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/RMAdminCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/RefreshClusterMaxPriorityRequestPBImpl.java


> Support admin cli interface in for Application Priority
> ---
>
> Key: YARN-3250
> URL: https://issues.apache.org/jira/browse/YARN-3250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Sunil G
>Assignee: Rohith Sharma K S
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3250-V1.patch, 0002-YARN-3250.patch, 
> 0003-YARN-3250.patch
>
>
> Current Application Priority Manager supports only configuration via file. 
> To support runtime configurations for admin cli and REST, a common management 
> interface has to be added which can be shared with NodeLabelsManager. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4036) Findbugs warnings in hadoop-yarn-server-common

2015-08-27 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4036:
---
Attachment: (was: findbugs_report.html)

> Findbugs warnings in hadoop-yarn-server-common
> --
>
> Key: YARN-4036
> URL: https://issues.apache.org/jira/browse/YARN-4036
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3232?focusedCommentId=14679146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14679146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1012) Report NM aggregated container resource utilization in heartbeat

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717317#comment-14717317
 ] 

Srikanth Kandula commented on YARN-1012:


Ack. Will do.

> Report NM aggregated container resource utilization in heartbeat
> 
>
> Key: YARN-1012
> URL: https://issues.apache.org/jira/browse/YARN-1012
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: Arun C Murthy
>Assignee: Inigo Goiri
> Fix For: 2.8.0
>
> Attachments: YARN-1012-1.patch, YARN-1012-10.patch, 
> YARN-1012-11.patch, YARN-1012-2.patch, YARN-1012-3.patch, YARN-1012-4.patch, 
> YARN-1012-5.patch, YARN-1012-6.patch, YARN-1012-7.patch, YARN-1012-8.patch, 
> YARN-1012-9.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717294#comment-14717294
 ] 

Srikanth Kandula commented on YARN-4081:


Ease of expression is a great thing to have. So also is extending to multiple 
resources. That is all cool.

I am mostly worried about the performance impact of replacing a small 
datastructure that has native types with a much larger datastructure that has 
user-defined types.  Could you run a profile?  How much more space would a 
resource object take up now? How much more time would it take to initialize and 
garbage collect 10K resource objects?

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4056) Bundling: Searching for multiple containers in a single pass over {queues, applications, priorities}

2015-08-27 Thread Srikanth Kandula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717267#comment-14717267
 ] 

Srikanth Kandula commented on YARN-4056:


I looked. Sort of similar but not really. The similarity is that both allow 
multiple containers to be allocated within fewer calls. 

The difference is in the policies and the complexity. Bundling allows any 
arbitrary subset of 'legit' tasks to be assigned. Whereas assignMultiple simply 
assigns the first few. For example, bundling can decide that the 2nd, 3rd and 
10th tasks are a good choice in contrast to assigning just the 1st task (the 
others may not fit). assignMultiple does not allow for this.

Bundling is slightly more complex because the actual assignment is deferred 
till the loop finishes. Whereas assignMultiple assigns each task in place and 
keeps going.

Patch is with [~chris.douglas] for an internal review.

We are pushing out a bundler that mimics the current scheduler. All the tests 
pass and there is no performance change. As expected. Note however that the 
allocations are still deferred.

Better bundlers are in the works.

> Bundling: Searching for multiple containers in a single pass over {queues, 
> applications, priorities}
> 
>
> Key: YARN-4056
> URL: https://issues.apache.org/jira/browse/YARN-4056
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacityscheduler, resourcemanager, scheduler
>Reporter: Srikanth Kandula
>Assignee: Robert Grandl
> Attachments: bundling.docx
>
>
> More than one container is allocated on many NM heartbeats. Yet, the current 
> scheduler allocates exactly one container per iteration over {{queues, 
> applications, priorities}}. When there are many queues, applications, or 
> priorities allocating only one container per iteration can  needlessly 
> increase the duration of the NM heartbeat.
>  
> In this JIRA, we propose bundling. That is, allow arbitrarily many containers 
> to be allocated in a single iteration over {{queues, applications and 
> priorities}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717257#comment-14717257
 ] 

Junping Du commented on YARN-4074:
--

Thanks for uploading a patch, [~sjlee0]! Sorry for coming late on this, but 
have a critical question on TimelineReader interface:
bq. Currently I am not planning to add new flow-specific methods to the 
TimelineReader interface.
If so , how to query lastest N records with existing getEntities() API? 
Actually, I think we should refactor existing getEntities() API before things 
get worse. It include too many parameters, and most of them are optional. This 
is very un-handy, easily cause bug and very hard to extend in future. 
Instead, we should define something like EntityFilter class to include most of 
these optional fields (include time range, topN, info/config/metric 
sub-filters, etc.) which also be extended easily for other filters in future. 
Thoughts?

Still in walking through your POC patch, more comments come after.



> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-27 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4091:
--
Attachment: Improvement on debugdiagnostic information - YARN.pdf

> Improvement: Introduce more debug/diagnostics information to detail out 
> scheduler activity
> --
>
> Key: YARN-4091
> URL: https://issues.apache.org/jira/browse/YARN-4091
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: Improvement on debugdiagnostic information - YARN.pdf
>
>
> As schedulers are improved with various new capabilities, more configurations 
> which tunes the schedulers starts to take actions such as limit assigning 
> containers to an application, or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under 
> these various scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in 
> scheduler where it skips/rejects container assignment, activate application 
> etc. Such information will help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve 
> on this as we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity

2015-08-27 Thread Sunil G (JIRA)
Sunil G created YARN-4091:
-

 Summary: Improvement: Introduce more debug/diagnostics information 
to detail out scheduler activity
 Key: YARN-4091
 URL: https://issues.apache.org/jira/browse/YARN-4091
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, resourcemanager
Affects Versions: 2.7.0
Reporter: Sunil G
Assignee: Sunil G


As schedulers are improved with various new capabilities, more configurations 
which tunes the schedulers starts to take actions such as limit assigning 
containers to an application, or introduce delay to allocate container etc. 
There are no clear information passed down from scheduler to outerworld under 
these various scenarios. This makes debugging very tougher.

This ticket is an effort to introduce more defined states on various parts in 
scheduler where it skips/rejects container assignment, activate application 
etc. Such information will help user to know whats happening in scheduler.

Attaching a short proposal for initial discussion. We would like to improve on 
this as we discuss.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717146#comment-14717146
 ] 

Sangjin Lee commented on YARN-4074:
---

cc [~gtCarrera9] and [~vrushalic] also for their thoughts.

There are some options for this, and there are pros and cons. I'm leaning 
towards the current proposal ((1) below) for now, but we could enhance this 
later as the UI jells more.

# do a specific entity query for each of the flow runs obtained from the flow 
activity entity
# return all flow runs (possibly with limits and time windows) for the given 
flow
# do a single query for all flow runs specified as a list of flow run id's

One interesting thing to note is that a flow activity entity (record) is an 
activity of that flow *for a given day*. In other words, there can be multiple 
flow activity entities for the same flow. The flow runs that are returned in 
the flow activity entity are only for that given day.

Then the question is, when I click that flow activity record, what flow runs do 
I expect to see? It's bit ambiguous, but I think it might make more sense to 
return only the flow runs that are referenced in that particular day if we're 
using the flow activity to render the landing page.

If we assume that, then (2) is probably not needed for this. Then it leaves us 
with (1) or (3). The benefit of (1) is that it fits easily into the existing 
reader API (getEntity). The downside is that you may need to make multiple 
reader calls to retrieve flow runs But normally the number of flow runs in a 
day for a given flow should be very small, so it might not be a big deal.

One hybrid approach may be that the REST API supports URLs based on the list 
but the web service code can make multiple reader getEntity() calls. We'd still 
need to define the form of the URLs to support that type of queries.

Thoughts?

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default

2015-08-27 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717131#comment-14717131
 ] 

Jian He commented on YARN-4087:
---

[~bibinchundatt],   the logic is that default value for RM_FAIL_FAST is  
YARN_FAIL_FAST

> Set YARN_FAIL_FAST to be false by default
> -
>
> Key: YARN-4087
> URL: https://issues.apache.org/jira/browse/YARN-4087
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4087.1.patch
>
>
> Increasingly, I feel setting this property to be false makes more sense 
> especially in production environment, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4036) Findbugs warnings in hadoop-yarn-server-common

2015-08-27 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-4036:
---
Attachment: findbugs_report.html

> Findbugs warnings in hadoop-yarn-server-common
> --
>
> Key: YARN-4036
> URL: https://issues.apache.org/jira/browse/YARN-4036
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
> Attachments: findbugs_report.html
>
>
> Refer to 
> https://issues.apache.org/jira/browse/YARN-3232?focusedCommentId=14679146&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14679146



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4074) [timeline reader] implement support for querying for flows and flow runs

2015-08-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717073#comment-14717073
 ] 

Varun Saxena commented on YARN-4074:


Ok..will have a look. We dont need to support a query like list all the flow 
runs for a flow ?

> [timeline reader] implement support for querying for flows and flow runs
> 
>
> Key: YARN-4074
> URL: https://issues.apache.org/jira/browse/YARN-4074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: YARN-4074-YARN-2928.POC.001.patch
>
>
> Implement support for querying for flows and flow runs.
> We should be able to query for the most recent N flows, etc.
> This includes changes to the {{TimelineReader}} API if necessary, as well as 
> implementation of the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717065#comment-14717065
 ] 

Varun Saxena commented on YARN-3528:


+1, latest patch LGTM

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716967#comment-14716967
 ] 

Naganarasimha G R commented on YARN-3717:
-

Hi [~leftnoteasy], 
Seems like the patch seems to be in state for review based on the previous 
jenkins report. Can you take a look @ the patch.

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716903#comment-14716903
 ] 

Hadoop QA commented on YARN-3717:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  25m  1s | Pre-patch trunk has 7 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 7 new or modified test files. |
| {color:green}+1{color} | javac |   7m 50s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | site |   3m  2s | Site still builds. |
| {color:red}-1{color} | checkstyle |   2m 45s | The applied patch generated  3 
new checkstyle issues (total was 16, now 18). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 32s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   7m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   7m  0s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   2m  1s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   3m 57s | Tests passed in 
hadoop-yarn-server-applicationhistoryservice. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |  58m  0s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | | 131m 34s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752747/YARN-3717.20150826-1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle site |
| git revision | trunk / 0bf2854 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-applicationhistoryservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-applicationhistoryservice.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8930/console |


This message was automatically generated.

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2015-08-27 Thread Xianyin Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716785#comment-14716785
 ] 

Xianyin Xin commented on YARN-4090:
---

We should also pay attention to the ReadLock.lock() and unlock() in the first 
img which cost much time.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
> Attachments: sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716784#comment-14716784
 ] 

Bibin A Chundatt commented on YARN-3893:


Testcase failures are not related to this patch.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716779#comment-14716779
 ] 

Hadoop QA commented on YARN-3893:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 42s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  2s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 51s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 32s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  53m 42s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 44s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestAllocationFileLoaderService
 |
|   | 
hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesHttpStaticUserPermissions
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752740/0010-YARN-3893.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8929/console |


This message was automatically generated.

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2015-08-27 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4090:
--
Attachment: sampling2.jpg
sampling1.jpg

I construct a queue hierarchy with 3 levels,
   root
   child1  child2child3
child1.child1~10, child2.child1~15, child3.child1~15
the number of leaf queues is 40. A total of 1000 apps running randomly on the 
leaf queues. The sampling results show that about 2/3 of the cpu times of 
FSParentQueue.assignContainers() was spent on Collections.sort(). In 
Collections.sort(), about 40% was spent on 
SchedulerAppplicationAttempt.getCurrentConsumption() and about 36% was spent on 
Resources.substract(). The former time consuming is because 
FSParentQueue.getResourceUsage() will make recursion on it's children, while 
for the latter time consuming, the clone() in substract() takes much cpu time.

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
> Attachments: sampling1.jpg, sampling2.jpg
>
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java

2015-08-27 Thread Xianyin Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated YARN-4090:
--
Summary: Make Collections.sort() more efficient in FSParentQueue.java  
(was: Make Collections.sort() more efficient in FSParent.java)

> Make Collections.sort() more efficient in FSParentQueue.java
> 
>
> Key: YARN-4090
> URL: https://issues.apache.org/jira/browse/YARN-4090
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: Xianyin Xin
>
> Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4090) Make Collections.sort() more efficient in FSParent.java

2015-08-27 Thread Xianyin Xin (JIRA)
Xianyin Xin created YARN-4090:
-

 Summary: Make Collections.sort() more efficient in FSParent.java
 Key: YARN-4090
 URL: https://issues.apache.org/jira/browse/YARN-4090
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Xianyin Xin


Collections.sort() consumes too much time in a scheduling round.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-27 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: YARN-3717.20150826-1.patch

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch, YARN-3717.20150826-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3717) Improve RM node labels web UI

2015-08-27 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3717:

Attachment: (was: YARN-3717.20150826-1.patch)

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3717) Improve RM node labels web UI

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716670#comment-14716670
 ] 

Naganarasimha G R commented on YARN-3717:
-

Seems like some issue in the build process Test results is not getting reported 
properly, deleting and reuploading the patch

> Improve RM node labels web UI
> -
>
> Key: YARN-3717
> URL: https://issues.apache.org/jira/browse/YARN-3717
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
> Attachments: 3717_cluster_test_snapshots.zip, RMLogsForHungJob.log, 
> YARN-3717.20150822-1.patch, YARN-3717.20150824-1.patch, 
> YARN-3717.20150825-1.patch
>
>
> 1> Add the default-node-Label expression for each queue in scheduler page.
> 2> In Application/Appattempt page  show the app configured node label 
> expression for AM and Job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716645#comment-14716645
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716644#comment-14716644
 ] 

Naganarasimha G R commented on YARN-3893:
-

Oops saw this message late !

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716642#comment-14716642
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt]
2> there are test cases related to transition in 
TestRMAdminService.testRMHAWithFileSystemBasedConfiguration but most of it is 
present in TestRMHA so i think it should be fine.

3> Well IMHO it would be better be handled in the later approach i suggested, 
as {{refreshAll}} is just a private method but actual  operation is 
transistionToActive which Failed which is more readable than 
{{ACTIVE_REFRESH_FAIL}}

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4067) available resource could be set negative

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716638#comment-14716638
 ] 

Hadoop QA commented on YARN-4067:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 43s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 28s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 50s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 30s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  58m 12s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  97m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751600/YARN-4067.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8928/console |


This message was automatically generated.

> available resource could be set negative
> 
>
> Key: YARN-4067
> URL: https://issues.apache.org/jira/browse/YARN-4067
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4067.patch
>
>
> as mentioned in YARN-4045 by [~leftnoteasy], available memory could be 
> negative due to reservation, propose to use componentwiseMax to 
> updateQueueStatistics in order to cap negative value to zero



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716629#comment-14716629
 ] 

Hadoop QA commented on YARN-4089:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 45s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 57s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 51s | The applied patch generated  1 
new checkstyle issues (total was 70, now 71). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  53m 30s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  92m 57s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752718/YARN-4089.001.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0bf2854 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8927/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8927/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8927/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8927/console |


This message was automatically generated.

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-4089
> URL: https://issues.apache.org/jira/browse/YARN-4089
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Shiwei Guo
>  Labels: patch
> Attachments: YARN-4089.001.patch
>
>
> There is a  race condition of calling 
> AbstractYarnScheduler.completedContainer, which will cause the usedResource 
> counter of application not accurate. At worst situation, the scheduler will 
> not allocate any resource to any application in some queue( when the 
> usedResource became negative) even there is indeed lots of free resource to 
> be allocated.
> It also cause the Scheduler UI and metrics report negative resource usage 
> value.In our cluster, it has the ability to run 13000+ container, but the WEB 
> UI says that:
> - Containers Running: -26546
> - Memory Used: -82.38 TB
> - VCores Used: -26451
> This is how it happens in FairSchedular:
> completedContainer method will call application.containerCompleted, which 
> will subtraction the resources used by this container from the usedResource 
> counter of the application. So, if the completedContainer are called twice 
> with the same container, the counter is subtracted too much values. So is the 
> updateRootQueueMetrics call, so we can see negative allocatedMemory on 
> rootQueue.
> The solution is to check whether the container being supplied is still live 
> inside the completedContainer (as shown in the patch). There is some check 
> before calling completedContainer, but that's not enough.
> For a more deeply discussion, the completedContainer may be called from two 
> place:
> 1. Trigered by RMContainerEventType.FINISHED event:
> {code:title=FairScheduler.nodeUpdate}
> // Process completed containers
> for (ContainerStatus completedContainer : completedContainers) {
>   ContainerId containerId = completedContainer.getContainerId();
>   LOG.debug("Container FINISHED: " + containerId);
>   completedContainer(getRMContainer(containerId),
>   completedContainer, RMCont

[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0010-YARN-3893.patch

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, 0010-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3893:
---
Attachment: 0009-YARN-3893.patch

Attaching patch after handling comments.
# timeout updated in testcase
# Changed from {{ACTIVE_REFRESH_FAIL}} to {{TRANSITION_TO_ACTIVE_FAILED}}

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> 0009-YARN-3893.patch, yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716563#comment-14716563
 ] 

Bibin A Chundatt commented on YARN-3893:


Hi Naga

Thnks for looking into patch

{quote}
timeout of 90 is on the higher side is that much req or was it for local 
testing ?
{quote}
will update the same.

{quote}
instead of test case in RMHA can we think of adding it to TestRMAdminService as 
the failure is related to transition to Active ?
{quote}
As i understand all transistiontoActive & HA related testcases are added in 
same class.

3.{{TRANSITION_TO_ACTIVE_FAILED}} is not actually failing its {{refreshAll}} 
rt? Thts the reason it gave specific name.

Points 2 and 3 are not mandatory fix items rt?

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-08-27 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716543#comment-14716543
 ] 

Naganarasimha G R commented on YARN-3893:
-

Hi [~bibinchundatt],
Thanks for the patch, test cases ran fine, approach and test case seems to be 
fine but few comments from my side 
# timeout of 90 is on the higher side is that much req or was it for local 
testing ?
# instead of test case in RMHA can we think of adding it to TestRMAdminService 
as the failure is related to transition to Active ? 
# May be while throwing RMFatalEvent better to wrap it with another exception 
wrapping the existing one and with the message that transition to active failed 
so that RM Logs have clear information on what operation it exited. or may be 
eventType instead of having {{ACTIVE_REFRESH_FAIL}} we can have more intuitive 
name {{TRANSITION_TO_ACTIVE_FAILED}}

> Both RM in active state when Admin#transitionToActive failure from refeshAll()
> --
>
> Key: YARN-3893
> URL: https://issues.apache.org/jira/browse/YARN-3893
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
> 0003-YARN-3893.patch, 0004-YARN-3893.patch, 0005-YARN-3893.patch, 
> 0006-YARN-3893.patch, 0007-YARN-3893.patch, 0008-YARN-3893.patch, 
> yarn-site.xml
>
>
> Cases that can cause this.
> # Capacity scheduler xml is wrongly configured during switch
> # Refresh ACL failure due to configuration
> # Refresh User group failure due to configuration
> Continuously both RM will try to be active
> {code}
> dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm1
> 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin>
>  ./yarn rmadmin  -getServiceState rm2
> 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> active
> {code}
> # Both Web UI active
> # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3933) Resources(both core and memory) are being negative

2015-08-27 Thread Shiwei Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716506#comment-14716506
 ] 

Shiwei Guo commented on YARN-3933:
--

I created a new [YARN-4089|https://issues.apache.org/jira/browse/YARN-4089] to 
describe the  race condition bug for FairScheduler. I'm a newbie to the hadoop 
community, hope didn't do anything bad. Thanks.

> Resources(both core and memory) are being negative
> --
>
> Key: YARN-3933
> URL: https://issues.apache.org/jira/browse/YARN-3933
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.2
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
>  Labels: patch
> Attachments: patch.BUGFIX-JIRA-YARN-3933.txt
>
>
> In our cluster we are seeing available memory and cores being negative. 
> Initial inspection:
> Scenario no. 1: 
> In capacity scheduler the method allocateContainersToNode() checks if 
> there are excess reservation of containers for an application, and they are 
> no longer needed then it calls queue.completedContainer() which causes 
> resources being negative. And they were never assigned in the first place. 
> I am still looking through the code. Can somebody suggest how to simulate 
> excess containers assignments ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-27 Thread Shiwei Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiwei Guo updated YARN-4089:
-
Attachment: YARN-4089.001.patch

> Race condition when calling AbstractYarnScheduler.completedContainer.
> -
>
> Key: YARN-4089
> URL: https://issues.apache.org/jira/browse/YARN-4089
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.0, 2.5.2, 2.7.1
>Reporter: Shiwei Guo
>  Labels: patch
> Attachments: YARN-4089.001.patch
>
>
> There is a  race condition of calling 
> AbstractYarnScheduler.completedContainer, which will cause the usedResource 
> counter of application not accurate. At worst situation, the scheduler will 
> not allocate any resource to any application in some queue( when the 
> usedResource became negative) even there is indeed lots of free resource to 
> be allocated.
> It also cause the Scheduler UI and metrics report negative resource usage 
> value.In our cluster, it has the ability to run 13000+ container, but the WEB 
> UI says that:
> - Containers Running: -26546
> - Memory Used: -82.38 TB
> - VCores Used: -26451
> This is how it happens in FairSchedular:
> completedContainer method will call application.containerCompleted, which 
> will subtraction the resources used by this container from the usedResource 
> counter of the application. So, if the completedContainer are called twice 
> with the same container, the counter is subtracted too much values. So is the 
> updateRootQueueMetrics call, so we can see negative allocatedMemory on 
> rootQueue.
> The solution is to check whether the container being supplied is still live 
> inside the completedContainer (as shown in the patch). There is some check 
> before calling completedContainer, but that's not enough.
> For a more deeply discussion, the completedContainer may be called from two 
> place:
> 1. Trigered by RMContainerEventType.FINISHED event:
> {code:title=FairScheduler.nodeUpdate}
> // Process completed containers
> for (ContainerStatus completedContainer : completedContainers) {
>   ContainerId containerId = completedContainer.getContainerId();
>   LOG.debug("Container FINISHED: " + containerId);
>   completedContainer(getRMContainer(containerId),
>   completedContainer, RMContainerEventType.FINISHED);
> }
> {code}
> 2. Trigered by RMContainerEventType.RELEASED
> {code:title=AbstractYarnScheduler.releaseContainers}
> completedContainer(rmContainer,
> SchedulerUtils.createAbnormalContainerStatus(containerId,
>   SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
> {code}
> RMContainerEventType.RELEASED is not triggered by MapReduce 
> ApplicationMaster, so we won't see this problem on MR jobs. But TEZ will 
> triggered it when it do not need this this container, while the NodeManger 
> will also report a container complete message to RM ,which in turn trigger 
> the RMContainerEventType.FINISHED event. If RMContainerEventType.FINISHED 
> event comes to RM early than TEZ AM, the problem happens.
> This behavior can be more easily seen if the cluster had setup a 
> TimelineServer for TEZ, which make it more likely TEZ AM will send 
> RMContainerEventType.RELEASED event later than NM send 
> RMContainerEventType.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level Aggregation for YARN system metrics

2015-08-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716497#comment-14716497
 ] 

Varun Saxena commented on YARN-3816:


Thanks [~djp] for the replies. 

bq. This will be part of failed over JIRAs
Ok.

bq. I would prefer to use TreeMap because it sort key (timestamp) when 
accessing it. aggregateTo() algorithm assume metrics are sorted by timestamp.
Hmm...Both getValues and getValuesJAXB return the same map but didnt notice the 
return types. So will have to typecast return value from getValues to use 
methods specific to TreeMap. In that case, I guess its fine to use 
getValuesJAXB 

bq. aggregateTo is not straighfoward and generic useful like methods in 
TimelineMetricCalculator, so let's hold on to expose it as utility class for 
now. Make it static sounds good though.
Ok.

I had one more question which you missed.
While TimelineMetric#toAggregate flag is meant to indicate if a metric needs to 
be aggregated. But are we planning to use it to indicate that a metric is an 
aggregated metric as well ? If yes, we should probably set this flag for each 
metric processed in TimelineCollector#appendAggregatedMetricsToEntities.
As Li said above will we be differentiating aggregated metrics from non 
aggregated ones ?

> [Aggregation] App-level Aggregation for YARN system metrics
> ---
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4089) Race condition when calling AbstractYarnScheduler.completedContainer.

2015-08-27 Thread Shiwei Guo (JIRA)
Shiwei Guo created YARN-4089:


 Summary: Race condition when calling 
AbstractYarnScheduler.completedContainer.
 Key: YARN-4089
 URL: https://issues.apache.org/jira/browse/YARN-4089
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.7.1, 2.5.2, 2.7.0, 2.6.0
Reporter: Shiwei Guo


There is a  race condition of calling AbstractYarnScheduler.completedContainer, 
which will cause the usedResource counter of application not accurate. At worst 
situation, the scheduler will not allocate any resource to any application in 
some queue( when the usedResource became negative) even there is indeed lots of 
free resource to be allocated.

It also cause the Scheduler UI and metrics report negative resource usage 
value.In our cluster, it has the ability to run 13000+ container, but the WEB 
UI says that:

- Containers Running: -26546
- Memory Used: -82.38 TB
- VCores Used: -26451

This is how it happens in FairSchedular:

completedContainer method will call application.containerCompleted, which will 
subtraction the resources used by this container from the usedResource counter 
of the application. So, if the completedContainer are called twice with the 
same container, the counter is subtracted too much values. So is the 
updateRootQueueMetrics call, so we can see negative allocatedMemory on 
rootQueue.

The solution is to check whether the container being supplied is still live 
inside the completedContainer (as shown in the patch). There is some check 
before calling completedContainer, but that's not enough.

For a more deeply discussion, the completedContainer may be called from two 
place:

1. Trigered by RMContainerEventType.FINISHED event:

{code:title=FairScheduler.nodeUpdate}
// Process completed containers
for (ContainerStatus completedContainer : completedContainers) {
  ContainerId containerId = completedContainer.getContainerId();
  LOG.debug("Container FINISHED: " + containerId);
  completedContainer(getRMContainer(containerId),
  completedContainer, RMContainerEventType.FINISHED);
}
{code}

2. Trigered by RMContainerEventType.RELEASED

{code:title=AbstractYarnScheduler.releaseContainers}
completedContainer(rmContainer,
SchedulerUtils.createAbnormalContainerStatus(containerId,
  SchedulerUtils.RELEASED_CONTAINER), RMContainerEventType.RELEASED);
{code}

RMContainerEventType.RELEASED is not triggered by MapReduce ApplicationMaster, 
so we won't see this problem on MR jobs. But TEZ will triggered it when it do 
not need this this container, while the NodeManger will also report a container 
complete message to RM ,which in turn trigger the RMContainerEventType.FINISHED 
event. If RMContainerEventType.FINISHED event comes to RM early than TEZ AM, 
the problem happens.

This behavior can be more easily seen if the cluster had setup a TimelineServer 
for TEZ, which make it more likely TEZ AM will send 
RMContainerEventType.RELEASED event later than NM send 
RMContainerEventType.FINISHED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716447#comment-14716447
 ] 

Brahma Reddy Battula commented on YARN-3528:


Following test case failure is unrelated and it can be handled in YARN-3433..

{noformat}
testNodeContainerXML(org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers)
  Time elapsed: 0.008 sec  <<< ERROR!
com.sun.jersey.test.framework.spi.container.TestContainerException: 
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:413)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:384)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:375)
at 
org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:549)
at 
org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:255)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:326)
at 
com.sun.jersey.api.container.grizzly2.GrizzlyServerFactory.createHttpServer(GrizzlyServerFactory.java:343)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.instantiateGrizzlyWebServer(GrizzlyWebTestContainerFactory.java:219)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:129)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory$GrizzlyWebTestContainer.(GrizzlyWebTestContainerFactory.java:86)
at 
com.sun.jersey.test.framework.spi.container.grizzly2.web.GrizzlyWebTestContainerFactory.create(GrizzlyWebTestContainerFactory.java:79)
at 
com.sun.jersey.test.framework.JerseyTest.getContainer(JerseyTest.java:342)
at com.sun.jersey.test.framework.JerseyTest.(JerseyTest.java:217)
at 
org.apache.hadoop.yarn.webapp.JerseyTestBase.(JerseyTestBase.java:27)
at 
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers.(TestNMWebServicesContainers.java:180)
{noformat}


> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716434#comment-14716434
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 46s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 26s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  6s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 57s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   7m 24s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752705/YARN-3528-008.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / fdb56f7 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8926/console |


This message was automatically generated.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716371#comment-14716371
 ] 

Brahma Reddy Battula commented on YARN-3528:


Updated 008 patch to address above testcase failures..

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716346#comment-14716346
 ] 

Hadoop QA commented on YARN-3528:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 26s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 55s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 45s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 4  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  5s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  22m 48s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | yarn tests |   6m 53s | Tests failed in 
hadoop-yarn-server-nodemanager. |
| | |  53m 19s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdater |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12752696/YARN-3528-007.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / fdb56f7 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8925/console |


This message was automatically generated.

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716343#comment-14716343
 ] 

Brahma Reddy Battula commented on YARN-3528:


hmm.Updated patch which address all the comments..[~rkanter] and 
[~varun_saxena] kindly review.. thanks

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-008.patch

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528-008.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716306#comment-14716306
 ] 

Varun Saxena commented on YARN-3528:


Thanks for updating the patch [~brahmareddy].

In the latest patch, same port has been assigned to both NM_ADDRESS and 
NM_LOCALIZER_ADDRESS. Haven't ran the test but this should lead to 
BindException in test. 
{code}
1722conf.set(YarnConfiguration.NM_ADDRESS, localhostAddress + ":" + 
port);
1723conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, localhostAddress + 
":"
1724+ port);
{code}

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-08-27 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated YARN-3528:
---
Attachment: YARN-3528-007.patch

> Tests with 12345 as hard-coded port break jenkins
> -
>
> Key: YARN-3528
> URL: https://issues.apache.org/jira/browse/YARN-3528
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
> Environment: ASF Jenkins
>Reporter: Steve Loughran
>Assignee: Brahma Reddy Battula
>Priority: Blocker
>  Labels: test
> Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
> YARN-3528-004.patch, YARN-3528-005.patch, YARN-3528-006.patch, 
> YARN-3528-007.patch, YARN-3528.patch
>
>
> A lot of the YARN tests have hard-coded the port 12345 for their services to 
> come up on.
> This makes it impossible to have scheduled or precommit tests to run 
> consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
> appear to get ignored completely.
> A quick grep of "12345" shows up many places in the test suite where this 
> practise has developed.
> * All {{BaseContainerManagerTest}} subclasses
> * {{TestNodeManagerShutdown}}
> * {{TestContainerManager}}
> + others
> This needs to be addressed through portscanning and dynamic port allocation. 
> Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4081) Add support for multiple resource types in the Resource class

2015-08-27 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716228#comment-14716228
 ] 

Varun Vasudev commented on YARN-4081:
-

Thanks for the feedback Srikanth!

bq. why use a Map. Is there a rough idea how many different resources one may 
want to encode?

I would like the resource types supported to be configured and not set in code. 
There's a proposal attached in the parent JIRA that goes into more detail on 
this. We've seen tickets filed on YARN for adding disk, network, and HDFS 
bandwidth as resource types. I would prefer it if we can let users just 
configure the types they want to use and allow them to add arbitrary resource 
types for scheduling(for example schedule based on the number of licenses 
available on a node). Is there an alternate structure you would prefer for me 
to use?

bq. Ditto for encapsulating strings in URIs
In the proposal, I propose using URIs as the identifier for the resource 
type(similar to what Kubernetes uses).

bq. ResourceInformation wrapper over doubles
I'm didn't understand this - are you asking why we're using ResourceInformation 
instead of using doubles?

> Add support for multiple resource types in the Resource class
> -
>
> Key: YARN-4081
> URL: https://issues.apache.org/jira/browse/YARN-4081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4081-YARN-3926.001.patch
>
>
> For adding support for multiple resource types, we need to add support for 
> this in the Resource class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3798) ZKRMStateStore shouldn't create new session without occurrance of SESSIONEXPIED

2015-08-27 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716206#comment-14716206
 ] 

zhihai xu commented on YARN-3798:
-

[~ozawa], Yes, the latest patch YARN-3798-branch-2.7.006.patch looks good to me.

> ZKRMStateStore shouldn't create new session without occurrance of 
> SESSIONEXPIED
> ---
>
> Key: YARN-3798
> URL: https://issues.apache.org/jira/browse/YARN-3798
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3
>Reporter: Bibin A Chundatt
>Assignee: Varun Saxena
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Attachments: RM.log, YARN-3798-2.7.002.patch, 
> YARN-3798-branch-2.6.01.patch, YARN-3798-branch-2.7.002.patch, 
> YARN-3798-branch-2.7.003.patch, YARN-3798-branch-2.7.004.patch, 
> YARN-3798-branch-2.7.005.patch, YARN-3798-branch-2.7.006.patch, 
> YARN-3798-branch-2.7.patch
>
>
> RM going down with NoNode exception during create of znode for appattempt
> *Please find the exception logs*
> {code}
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2015-06-09 10:09:44,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2015-06-09 10:09:44,886 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> Exception while executing a ZK operation.
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1405)
>   at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1310)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:926)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1101)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1122)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:923)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doStoreMultiWithRetries(ZKRMStateStore.java:937)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:970)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:671)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:275)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$UpdateAppAttemptTransition.transition(RMStateStore.java:260)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:837)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:900)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:895)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:175)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> 2015-06-09 10:09:44,887 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: Maxed 
> out ZK retries. Giving up!
> 2015-06-09 10:09:44,887 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Error 
> updating appAttempt: appattempt_1433764310492_7152_01
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:115)
>   at org.apache.zookeeper.ZooKeeper.multiInternal(Zoo