[jira] [Comment Edited] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-22 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060372#comment-16060372
 ] 

Tao Yang edited comment on YARN-6678 at 6/23/17 4:11 AM:
-

Thanks [~sunilg] for your comments.
{quote}
1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of 
using !=
{quote}
As [~leftnoteasy] mentioned, it should be enough to use == to compare two 
instances. Are there some other concerns about this? 
I noticed that this patch caused several failed tests, but these are all passed 
when I run it locally. What might be the problem?


was (Author: tao yang):
Thanks [~sunilg] for your comments.
{quote}
1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of 
using !=
{quote}
As [~leftnoteasy] mentioned, it should be enough to use == to compare two 
instances. Are there some other concerns about this? 

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-22 Thread Tao Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060372#comment-16060372
 ] 

Tao Yang commented on YARN-6678:


Thanks [~sunilg] for your comments.
{quote}
1. In FiCaSchedulerApp#accept, its better to use RMContainer#equals instead of 
using !=
{quote}
As [~leftnoteasy] mentioned, it should be enough to use == to compare two 
instances. Are there some other concerns about this? 

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk

2017-06-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060355#comment-16060355
 ] 

Naganarasimha G R commented on YARN-5006:
-

[~bibinchundatt] there seems to be compilation problem for branch-2 on cherry 
pick. Can you please check and upload patch for branch -2

> ResourceManager quit due to ApplicationStateData exceed the limit  size of 
> znode in zk
> --
>
> Key: YARN-5006
> URL: https://issues.apache.org/jira/browse/YARN-5006
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.2
>Reporter: dongtingting
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5006.001.patch, YARN-5006.002.patch, 
> YARN-5006.003.patch, YARN-5006.004.patch, YARN-5006.005.patch
>
>
> Client submit a job, this job add 1 file into DistributedCache. when the 
> job is submitted, ResourceManager sotre ApplicationStateData into zk. 
> ApplicationStateData  is exceed the limit size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>   implements SingleArcTransition {
> @Override
> public void transition(RMStateStore store, RMStateStoreEvent event) {
>   if (!(event instanceof RMStateStoreAppEvent)) {
> // should never happen
> LOG.error("Illegal event type: " + event.getClass());
> return;
>   }
>   ApplicationState appState = ((RMStateStoreAppEvent) 
> event).getAppState();
>   ApplicationId appId = appState.getAppId();
>   ApplicationStateData appStateData = ApplicationStateData
>   .newInstance(appState);
>   LOG.info("Storing info for app: " + appId);
>   try {  
> store.storeApplicationStateInternal(appId, appStateData);  //store 
> the appStateData
> store.notifyApplication(new RMAppEvent(appId,
>RMAppEventType.APP_NEW_SAVED));
>   } catch (Exception e) {
> LOG.error("Error storing app: " + appId, e);
> store.notifyStoreOperationFailed(e);   //handle fail event, system 
> exit 
>   }
> };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
> AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> AsyncDispatcher event handler: Error storing app: 
> application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1096)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.doMultiWithRetries(ZKRMStateStore.java:947)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.createWithRetries(ZKRMStateStore.java:956)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeApplicationStateInternal(ZKRMStateStore.java:626)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:138)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:123)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> 

[jira] [Commented] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2017-06-22 Thread Subru Krishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060350#comment-16060350
 ] 

Subru Krishnan commented on YARN-3659:
--

Thanks [~giovanni.fumarola] for working on this. I looked at your patch, please 
find my comments below:
* A general note on Javadocs - missing code/link annotations throughout.
* Can you add some documentation for ROUTER_CLIENTRM_SUBMIT_RETRY. You will 
also need to exclude it in {{TestYarnConfigurationFields}}.
* Is it possible to move the check of subcluster list being empty to 
{{AbstractRouterPolicy}} to prevent duplication?
* A note on *isRunning* field in {{MockResourceManagerFacade}} will be useful.
* Rename NO_SUBCLUSTER_MESSAGE to NO_ACTIVE_SUBCLUSTER_AVAILABLE and update the 
value accordingly.
* Why is the visibility of {{RouterUtil}} public? I feel you can also rename it 
to RouterServerUtil.
* Resolution of UGI can be moved to {{AbstractClientRequestInterceptor}} as 
again all child classes are duplicating the same code.
* Would it be better to use *LRUCacheHashMap* for _clientRMProxies_?
* Rename *getApplicationClientProtocolFromSubClusterId* --> 
*getClientRMProxyForSubCluster*.
* Distinguish the log message for different exceptions for clarity.
* Use the {{UniformRandomRouterPolicy}} to determine a random subcluster as 
that's the purpose of the policy.
* {code} Client: behavior as YARN. should be Client: identical behavior as 
{@code ClientRMService}. {code}
* Documentation for exception handling in *submitApplication* is unclear, 
please rephrase it.
* Rename _aclTemp_ to _clientRMProxy_.
* {code}"Unable to create a new application to SubCluster " should be " No 
response when attempting to submit the application " + applicationId + "to 
SubCluster " + subClusterId.getId() {code}.
* The log statement for app submission can be moved to post successful 
submission as it's redundant now.
* We should forward any exceptions we get on *forceKillApplication* and 
*getApplicationReport* to client.
* Have a note upfront saying we have only implemented the core functionalities 
and rest will be done in a follow up JIRA (create and link).


> Federation Router (hiding multiple RMs for ApplicationClientProtocol)
> -
>
> Key: YARN-3659
> URL: https://issues.apache.org/jira/browse/YARN-3659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3659.pdf, YARN-3659-YARN-2915.1.patch, 
> YARN-3659-YARN-2915.draft.patch
>
>
> This JIRA tracks the design/implementation of the layer for routing 
> ApplicaitonClientProtocol requests to the appropriate
> RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5006) ResourceManager quit due to ApplicationStateData exceed the limit size of znode in zk

2017-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060341#comment-16060341
 ] 

Hudson commented on YARN-5006:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11912 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11912/])
YARN-5006. ResourceManager quit due to ApplicationStateData exceed the 
(naganarasimha_gr: rev 740204b2926f49ea70596c6059582ce409fbdd90)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/StoreLimitException.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


> ResourceManager quit due to ApplicationStateData exceed the limit  size of 
> znode in zk
> --
>
> Key: YARN-5006
> URL: https://issues.apache.org/jira/browse/YARN-5006
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0, 2.7.2
>Reporter: dongtingting
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-5006.001.patch, YARN-5006.002.patch, 
> YARN-5006.003.patch, YARN-5006.004.patch, YARN-5006.005.patch
>
>
> Client submit a job, this job add 1 file into DistributedCache. when the 
> job is submitted, ResourceManager sotre ApplicationStateData into zk. 
> ApplicationStateData  is exceed the limit size of znode. RM exit 1.   
> The related code in RMStateStore.java :
> {code}
>   private static class StoreAppTransition
>   implements SingleArcTransition {
> @Override
> public void transition(RMStateStore store, RMStateStoreEvent event) {
>   if (!(event instanceof RMStateStoreAppEvent)) {
> // should never happen
> LOG.error("Illegal event type: " + event.getClass());
> return;
>   }
>   ApplicationState appState = ((RMStateStoreAppEvent) 
> event).getAppState();
>   ApplicationId appId = appState.getAppId();
>   ApplicationStateData appStateData = ApplicationStateData
>   .newInstance(appState);
>   LOG.info("Storing info for app: " + appId);
>   try {  
> store.storeApplicationStateInternal(appId, appStateData);  //store 
> the appStateData
> store.notifyApplication(new RMAppEvent(appId,
>RMAppEventType.APP_NEW_SAVED));
>   } catch (Exception e) {
> LOG.error("Error storing app: " + appId, e);
> store.notifyStoreOperationFailed(e);   //handle fail event, system 
> exit 
>   }
> };
>   }
> {code}
> The Exception log:
> {code}
>  ...
> 2016-04-20 11:26:35,732 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore 
> AsyncDispatcher event handler: Maxed out ZK retries. Giving up!
> 2016-04-20 11:26:35,732 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore 
> AsyncDispatcher event handler: Error storing app: 
> application_1461061795989_17671
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:936)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$4.run(ZKRMStateStore.java:933)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1075)
> at 
> 

[jira] [Updated] (YARN-3659) Federation Router (hiding multiple RMs for ApplicationClientProtocol)

2017-06-22 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-3659:
---
Attachment: YARN-3659-YARN-2915.1.patch

> Federation Router (hiding multiple RMs for ApplicationClientProtocol)
> -
>
> Key: YARN-3659
> URL: https://issues.apache.org/jira/browse/YARN-3659
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: client, resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3659.pdf, YARN-3659-YARN-2915.1.patch, 
> YARN-3659-YARN-2915.draft.patch
>
>
> This JIRA tracks the design/implementation of the layer for routing 
> ApplicaitonClientProtocol requests to the appropriate
> RM(s) in a federated YARN cluster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service

2017-06-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6734:
---

Assignee: Rohith Sharma K S

> Ensure sub-application user is extracted & sent to timeline service
> ---
>
> Key: YARN-6734
> URL: https://issues.apache.org/jira/browse/YARN-6734
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information. YARN-6733
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> YARN-6733 tracks the code changes needed for  table schema creation.
> This jira tracks writing to that table, updating the user name fields to 
> include sub-application user etc. This would mean adding a field to Flow 
> Context which can store an additional user 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6736) Consider writing to both ats v1 & v2 from RM for smoother upgrades

2017-06-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6736:
---

Assignee: Rohith Sharma K S

> Consider writing to both ats v1 & v2 from RM for smoother upgrades
> --
>
> Key: YARN-6736
> URL: https://issues.apache.org/jira/browse/YARN-6736
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>
> When the cluster is being upgraded from atsv1 to v2, it may be good to have a 
> brief time period during which RM writes to both atsv1 and v2. This will help 
> frameworks like Tez migrate more smoothly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6735) Have a way to turn off system metrics from NMs

2017-06-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S reassigned YARN-6735:
---

Assignee: Rohith Sharma K S

> Have a way to turn off system metrics from NMs
> --
>
> Key: YARN-6735
> URL: https://issues.apache.org/jira/browse/YARN-6735
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>
> Have a way to turn off emitting system metrics from NMs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6736) Consider writing to both ats v1 & v2 from RM for smoother upgrades

2017-06-22 Thread Vrushali C (JIRA)
Vrushali C created YARN-6736:


 Summary: Consider writing to both ats v1 & v2 from RM for smoother 
upgrades
 Key: YARN-6736
 URL: https://issues.apache.org/jira/browse/YARN-6736
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vrushali C



When the cluster is being upgraded from atsv1 to v2, it may be good to have a 
brief time period during which RM writes to both atsv1 and v2. This will help 
frameworks like Tez migrate more smoothly. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6735) Have a way to turn off system metrics from NMs

2017-06-22 Thread Vrushali C (JIRA)
Vrushali C created YARN-6735:


 Summary: Have a way to turn off system metrics from NMs
 Key: YARN-6735
 URL: https://issues.apache.org/jira/browse/YARN-6735
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vrushali C



Have a way to turn off emitting system metrics from NMs



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6734) Ensure sub-application user is extracted & sent to timeline service

2017-06-22 Thread Vrushali C (JIRA)
Vrushali C created YARN-6734:


 Summary: Ensure sub-application user is extracted & sent to 
timeline service
 Key: YARN-6734
 URL: https://issues.apache.org/jira/browse/YARN-6734
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vrushali C



After a discussion with Tez folks, we have been thinking over introducing a 
table to store  sub-application information. YARN-6733

For example, if a Tez session runs for a certain period as User X and runs a 
few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
with a doAs user. ATSv2 should store this information in a new table perhaps 
called as "sub_application" table. 
YARN-6733 tracks the code changes needed for  table schema creation.

This jira tracks writing to that table, updating the user name fields to 
include sub-application user etc. This would mean adding a field to Flow 
Context which can store an additional user 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6733) Add table for storing sub-application entities

2017-06-22 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-6733:
-
Summary: Add table for storing sub-application entities  (was: Support 
storing sub-application entities)

> Add table for storing sub-application entities
> --
>
> Key: YARN-6733
> URL: https://issues.apache.org/jira/browse/YARN-6733
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>
> After a discussion with Tez folks, we have been thinking over introducing a 
> table to store  sub-application information.
> For example, if a Tez session runs for a certain period as User X and runs a 
> few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
> with a doAs user. ATSv2 should store this information in a new table perhaps 
> called as "sub_application" table. 
> This jira tracks the code changes needed for  table schema creation.
> I will file other jiras for writing to that table, updating the user name 
> fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6733) Support storing sub-application entities

2017-06-22 Thread Vrushali C (JIRA)
Vrushali C created YARN-6733:


 Summary: Support storing sub-application entities
 Key: YARN-6733
 URL: https://issues.apache.org/jira/browse/YARN-6733
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vrushali C
Assignee: Vrushali C



After a discussion with Tez folks, we have been thinking over introducing a 
table to store  sub-application information.

For example, if a Tez session runs for a certain period as User X and runs a 
few AMs. These AMs accept DAGs from other users. Tez will execute these dags 
with a doAs user. ATSv2 should store this information in a new table perhaps 
called as "sub_application" table. 

This jira tracks the code changes needed for  table schema creation.

I will file other jiras for writing to that table, updating the user name 
fields to include sub-application user etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-06-22 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060140#comment-16060140
 ] 

Yufei Gu edited comment on YARN-6625 at 6/22/17 11:52 PM:
--

Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line 
discussion.  
First, we don't use the solution in patch v1 since it is not a good idea to let 
AM access the Admin service of RM. Second, AM cannot get RM HA status thought 
scheduler service since there is no scheduler service in a standby RM. 

Patch v2 provide a solution which try RMs one by one until we find one alive RM 
based on the assumption that it doesn't matter if RM is active or standby in 
this situation since if AM redirect to a standby RM, the standby RM will do 
redirect to active RM anyway. The only concern is that we should make sure the 
RM is alive. 


was (Author: yufeigu):
Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line 
discussion.  
First, we don't use the solution in patch v1. Seems like it is not a good idea 
to let AM access the Admin service of RM. Meanwhile, AM cannot get RM HA status 
thought scheduler service since there is no scheduler service in a standby RM. 

Patch v2 provide a solution which try RMs one by one until we find one alive RM 
based on the assumption that it doesn't matter if RM is active or standby in 
this situation since if AM redirect to a standby RM, the standby RM will do 
redirect to active RM anyway. The only concern is that we should make sure the 
RM is alive. 

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060146#comment-16060146
 ] 

Hadoop QA commented on YARN-6625:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
30s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m  9s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy:
 The patch generated 1 new + 13 unchanged - 0 fixed = 14 total (was 13) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-server-web-proxy in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 21m 31s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6625 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874148/YARN-6625.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ac8dd4d8289f 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 6d116ff |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16225/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-web-proxy.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16225/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy 
|
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16225/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> 

[jira] [Commented] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-06-22 Thread Yufei Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060140#comment-16060140
 ] 

Yufei Gu commented on YARN-6625:


Thanks [~rkanter] for the review. I uploaded patch v2 as our off-line 
discussion.  
First, we don't use the solution in patch v1. Seems like it is not a good idea 
to let AM access the Admin service of RM. Meanwhile, AM cannot get RM HA status 
thought scheduler service since there is no scheduler service in a standby RM. 

Patch v2 provide a solution which try RMs one by one until we find one alive RM 
based on the assumption that it doesn't matter if RM is active or standby in 
this situation since if AM redirect to a standby RM, the standby RM will do 
redirect to active RM anyway. The only concern is that we should make sure the 
RM is alive. 

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2017-06-22 Thread Benyi Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060139#comment-16060139
 ] 

Benyi Wang commented on YARN-1492:
--

Hi [~ctrezzo],

bq. The shared cache leverages checksuming and the node manager local cache to 
ensure applications can reuse resources that are already localized on node 
managers. 

Questions about the cache on Node Manager?
* Could you explain how the shared cache leverage the node manager local cache 
in detail? 
* Are those shared jars marked as PUBLIC? 
* Could you point me the source code that handle this?

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Chris Trezzo
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf, 
> shared_cache_design_v5.pdf, shared_cache_design_v6.pdf, 
> YARN-1492-all-trunk-v1.patch, YARN-1492-all-trunk-v2.patch, 
> YARN-1492-all-trunk-v3.patch, YARN-1492-all-trunk-v4.patch, 
> YARN-1492-all-trunk-v5.patch
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6625) yarn application -list returns a tracking URL for AM that doesn't work in secured and HA environment

2017-06-22 Thread Yufei Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yufei Gu updated YARN-6625:
---
Attachment: YARN-6625.002.patch

> yarn application -list returns a tracking URL for AM that doesn't work in 
> secured and HA environment
> 
>
> Key: YARN-6625
> URL: https://issues.apache.org/jira/browse/YARN-6625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: amrmproxy
>Affects Versions: 3.0.0-alpha2
>Reporter: Yufei Gu
>Assignee: Yufei Gu
> Attachments: YARN-6625.001.patch, YARN-6625.002.patch
>
>
> The tracking URL given at the command line should work secured or not. The 
> tracking URLs are like http://node-2.abc.com:47014 and AM web server supposed 
> to redirect it to a RM address like this 
> http://node-1.abc.com:8088/proxy/application_1494544954891_0002/, but it 
> fails to do that because the connection is rejected when AM is talking to RM 
> admin service to get HA status.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-6127:
--
Target Version/s: 2.9.0, 3.0.0-beta1  (was: 3.0.0-beta1)

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, 
> YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060094#comment-16060094
 ] 

Arun Suresh commented on YARN-6127:
---

Not sure why Jenkins gave a compilation error.
Committed to branch-2 as well. Thanks for the branch-2 patch [~botong]. It 
works fine locally.

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, 
> YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6732) Could not find artifact org.apache.hadoop:hadoop-azure-datalake:jar

2017-06-22 Thread Miklos Szegedi (JIRA)
Miklos Szegedi created YARN-6732:


 Summary: Could not find artifact 
org.apache.hadoop:hadoop-azure-datalake:jar
 Key: YARN-6732
 URL: https://issues.apache.org/jira/browse/YARN-6732
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Miklos Szegedi


I get the following build error when resolving dependencies:
{code}
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 04:03 min
[INFO] Finished at: 2017-06-22T18:34:55+00:00
[INFO] Final Memory: 69M/167M
[INFO] 
[ERROR] Failed to execute goal on project hadoop-tools-dist: Could not resolve 
dependencies for project 
org.apache.hadoop:hadoop-tools-dist:jar:2.9.0-SNAPSHOT: Could not find artifact 
org.apache.hadoop:hadoop-azure-datalake:jar:2.9.0-SNAPSHOT in 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-tools-dist
The command '/bin/sh -c mvn dependency:resolve' returned a non-zero code: 1
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16060003#comment-16060003
 ] 

Hadoop QA commented on YARN-6127:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
30s{color} | {color:red} root in branch-2 failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
19s{color} | {color:red} hadoop-yarn-server in branch-2 failed with JDK 
v1.8.0_131. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  1m 
14s{color} | {color:red} hadoop-yarn-server in branch-2 failed with JDK 
v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
46s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK 
v1.8.0_131. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 46s{color} 
| {color:red} hadoop-yarn-server in the patch failed with JDK v1.8.0_131. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
58s{color} | {color:red} hadoop-yarn-server in the patch failed with JDK 
v1.7.0_131. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 58s{color} 
| {color:red} hadoop-yarn-server in the patch failed with JDK v1.7.0_131. 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server: 
The patch generated 0 new + 198 unchanged - 4 fixed = 198 total (was 202) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-jdk1.8.0_131
 with JDK v1.8.0_131 generated 0 new + 197 unchanged - 26 fixed = 197 total 
(was 223) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m  
7s{color} | {color:green} hadoop-yarn-server-tests in the patch passed with JDK 
v1.8.0_131. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 14m 17s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.7.0_131. {color} |
| 

[jira] [Created] (YARN-6731) Add ability to export scheduler configuration XML

2017-06-22 Thread Jonathan Hung (JIRA)
Jonathan Hung created YARN-6731:
---

 Summary: Add ability to export scheduler configuration XML
 Key: YARN-6731
 URL: https://issues.apache.org/jira/browse/YARN-6731
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Jonathan Hung
Assignee: Jonathan Hung


This is useful for debugging/cluster migration/peace of mind.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh reopened YARN-6127:
---

Re-opening to test branch-2 patch

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, 
> YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Botong Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Botong Huang updated YARN-6127:
---
Attachment: YARN-6127-branch-2.v1.patch

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127-branch-2.v1.patch, YARN-6127.v1.patch, 
> YARN-6127.v2.patch, YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059883#comment-16059883
 ] 

ASF GitHub Bot commented on YARN-6673:
--

Github user haibchen commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/232#discussion_r123601141
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java
 ---
@@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path)
   @Override
   public List preStart(Container container)
   throws ResourceHandlerException {
-
 String cgroupId = container.getContainerId().toString();
 Resource containerResource = container.getResource();
 cGroupsHandler.createCGroup(CPU, cgroupId);
 try {
   int containerVCores = containerResource.getVirtualCores();
-  int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
-  cGroupsHandler
-  .updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
-  String.valueOf(cpuShares));
+  ContainerTokenIdentifier id = 
container.getContainerTokenIdentifier();
+  if (id != null && id.getExecutionType() ==
+  ExecutionType.OPPORTUNISTIC) {
+cGroupsHandler
+.updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
+String.valueOf(CPU_DEFAULT_WEIGHT_OPPORTUNISTIC));
+  } else {
+int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
+cGroupsHandler
+.updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
+String.valueOf(cpuShares));
+  }
   if (strictResourceUsageMode) {
--- End diff --

I see. Thanks for the clarification.


> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059881#comment-16059881
 ] 

ASF GitHub Bot commented on YARN-6673:
--

Github user haibchen commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/232#discussion_r123601076
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java
 ---
@@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path)
   @Override
   public List preStart(Container container)
   throws ResourceHandlerException {
-
 String cgroupId = container.getContainerId().toString();
 Resource containerResource = container.getResource();
 cGroupsHandler.createCGroup(CPU, cgroupId);
 try {
   int containerVCores = containerResource.getVirtualCores();
-  int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
-  cGroupsHandler
-  .updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
-  String.valueOf(cpuShares));
+  ContainerTokenIdentifier id = 
container.getContainerTokenIdentifier();
+  if (id != null && id.getExecutionType() ==
+  ExecutionType.OPPORTUNISTIC) {
--- End diff --

yeah. In some sense, it is to be more descriptive than code reuse. Maybe we 
could have an inline boolean variable with such name?


> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059858#comment-16059858
 ] 

Hudson commented on YARN-6127:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11908 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11908/])
YARN-6127. Add support for work preserving NM restart when AMRMProxy is (arun 
suresh: rev 49aa60e50d20f8c18ed6f00fa8966244536fe7da)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AbstractRequestInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMNullStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMMemoryStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/RequestInterceptor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/recovery/NMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyApplicationContextImpl.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/recovery/TestNMLeveldbStateStoreService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyService.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/BaseAMRMProxyTest.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/AMRMProxyTokenSecretManager.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/amrmproxy/TestAMRMProxyTokenSecretManager.java


> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, 
> YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6674) Add memory cgroup settings for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059829#comment-16059829
 ] 

ASF GitHub Bot commented on YARN-6674:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/240#discussion_r123589398
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsMemoryResourceHandlerImpl.java
 ---
@@ -122,12 +128,23 @@ int getSwappiness() {
   cGroupsHandler.updateCGroupParam(MEMORY, cgroupId,
   CGroupsHandler.CGROUP_PARAM_MEMORY_HARD_LIMIT_BYTES,
   String.valueOf(containerHardLimit) + "M");
-  cGroupsHandler.updateCGroupParam(MEMORY, cgroupId,
-  CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES,
-  String.valueOf(containerSoftLimit) + "M");
-  cGroupsHandler.updateCGroupParam(MEMORY, cgroupId,
-  CGroupsHandler.CGROUP_PARAM_MEMORY_SWAPPINESS,
-  String.valueOf(swappiness));
+  ContainerTokenIdentifier id = 
container.getContainerTokenIdentifier();
+  if (id != null && id.getExecutionType() ==
+  ExecutionType.OPPORTUNISTIC) {
+cGroupsHandler.updateCGroupParam(MEMORY, cgroupId,
+CGroupsHandler.CGROUP_PARAM_MEMORY_SOFT_LIMIT_BYTES,
+String.valueOf(OPPORTUNISTIC_SOFT_LIMIT) + "M");
+cGroupsHandler.updateCGroupParam(MEMORY, cgroupId,
--- End diff --

Swapping may cause issues, yes, however we cannot tell the user to turn it 
off.


> Add memory cgroup settings for opportunistic containers
> ---
>
> Key: YARN-6674
> URL: https://issues.apache.org/jira/browse/YARN-6674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059826#comment-16059826
 ] 

Arun Suresh commented on YARN-6127:
---

Committed this to trunk. Thanks [~botong]

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, 
> YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059820#comment-16059820
 ] 

Varun Saxena commented on YARN-5648:


Findbugs warning are existing issues and unrelated. Test failures are 
outstanding issues on trunk

> [ATSv2 Security] Client side changes for authentication
> ---
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, 
> YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java

2017-06-22 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059817#comment-16059817
 ] 

Zhe Zhang commented on YARN-6719:
-

Removed release-blocker label since it's already committed to branch-2.7

> Fix findbugs warnings in SLSCapacityScheduler.java
> --
>
> Key: YARN-6719
> URL: https://issues.apache.org/jira/browse/YARN-6719
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Fix For: 2.9.0, 2.7.4, 2.8.2
>
> Attachments: YARN-6719-branch-2.01.patch, 
> YARN-6719-branch-2.8-01.patch
>
>
> There are 2 findbugs warnings in branch-2. 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html
> {noformat}
> DmFound reliance on default encoding in 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
> java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler
> In method 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics()
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 464]
> DmFound reliance on default encoding in new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
>  new java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable
> In method new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler)
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 669]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6719) Fix findbugs warnings in SLSCapacityScheduler.java

2017-06-22 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated YARN-6719:

Labels:   (was: release-blocker)

> Fix findbugs warnings in SLSCapacityScheduler.java
> --
>
> Key: YARN-6719
> URL: https://issues.apache.org/jira/browse/YARN-6719
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
> Fix For: 2.9.0, 2.7.4, 2.8.2
>
> Attachments: YARN-6719-branch-2.01.patch, 
> YARN-6719-branch-2.8-01.patch
>
>
> There are 2 findbugs warnings in branch-2. 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/12560/artifact/patchprocess/branch-findbugs-hadoop-tools_hadoop-sls-warnings.html
> {noformat}
> DmFound reliance on default encoding in 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics(): new 
> java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler
> In method 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler.initMetrics()
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 464]
> DmFound reliance on default encoding in new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler):
>  new java.io.FileWriter(String)
> Bug type DM_DEFAULT_ENCODING (click for details) 
> In class 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable
> In method new 
> org.apache.hadoop.yarn.sls.scheduler.SLSCapacityScheduler$MetricsLogRunnable(SLSCapacityScheduler)
> Called method new java.io.FileWriter(String)
> At SLSCapacityScheduler.java:[line 669]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Botong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059809#comment-16059809
 ] 

Botong Huang commented on YARN-6127:


Thanks [~asuresh]! YARN-6730 created to follow up. 

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, 
> YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6729) NM percentage-physical-cpu-limit should be always 100 if DefaultLCEResourcesHandler is used

2017-06-22 Thread Yufei Gu (JIRA)
Yufei Gu created YARN-6729:
--

 Summary: NM percentage-physical-cpu-limit should be always 100 if 
DefaultLCEResourcesHandler is used
 Key: YARN-6729
 URL: https://issues.apache.org/jira/browse/YARN-6729
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 3.0.0-alpha3
Reporter: Yufei Gu


NM percentage-physical-cpu-limit is not honored in DefaultLCEResourcesHandler, 
which may cause container cpu usage calculation issue. e.g. container vcore 
usage is potentially more than 100% if percentage-physical-cpu-limit is set to 
a value less than 100. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6730) Make sure NM state store is not null consistently

2017-06-22 Thread Botong Huang (JIRA)
Botong Huang created YARN-6730:
--

 Summary: Make sure NM state store is not null consistently
 Key: YARN-6730
 URL: https://issues.apache.org/jira/browse/YARN-6730
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Botong Huang
Assignee: Botong Huang
Priority: Minor


In the NM statestore for NM restart, there are a lot of places where we check 
if the stateStore != null. This is true in the existing codebase too. Ideally, 
the stateStore should never be null because we have the NullStateStore 
implementation and we should not have to perform so many defensive checks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059800#comment-16059800
 ] 

Arun Suresh edited comment on YARN-6127 at 6/22/17 6:18 PM:


Will commit this shortly.
One suggestion though - there are a lot of places where we check if the 
stateStore != null. This is true in the existing codebase too. Ideally, the 
stateStore should never be null and we should not have to perform so many 
defensive checks. [~botong], can you open a followup JIRA to fix this ?


was (Author: asuresh):
Will commit this shortly.
One suggestion though - there are a lot of places where we check if the 
stateStore != null. This is true in the existing codebase too. Ideally, the 
stateStore should never be null and we should have to perform so many defensive 
checks. [~botong], can you open a followup JIRA to fix this ?

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, 
> YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6127) Add support for work preserving NM restart when AMRMProxy is enabled

2017-06-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059800#comment-16059800
 ] 

Arun Suresh commented on YARN-6127:
---

Will commit this shortly.
One suggestion though - there are a lot of places where we check if the 
stateStore != null. This is true in the existing codebase too. Ideally, the 
stateStore should never be null and we should have to perform so many defensive 
checks. [~botong], can you open a followup JIRA to fix this ?

> Add support for work preserving NM restart when AMRMProxy is enabled
> 
>
> Key: YARN-6127
> URL: https://issues.apache.org/jira/browse/YARN-6127
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: amrmproxy, nodemanager
>Reporter: Subru Krishnan
>Assignee: Botong Huang
> Attachments: YARN-6127.v1.patch, YARN-6127.v2.patch, 
> YARN-6127.v3.patch, YARN-6127.v4.patch
>
>
> YARN-1336 added the ability to restart NM without loosing any running 
> containers. In a Federated YARN environment, there's additional state in the 
> {{AMRMProxy}} to allow for spanning across multiple sub-clusters, so we need 
> to enhance {{AMRMProxy}} to support work-preserving restart.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-5950) Create StoreConfigurationProvider to construct a Configuration from the backing store

2017-06-22 Thread Jonathan Hung (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hung resolved YARN-5950.
-
Resolution: Duplicate

This was handled in YARN-5948.

> Create StoreConfigurationProvider to construct a Configuration from the 
> backing store
> -
>
> Key: YARN-5950
> URL: https://issues.apache.org/jira/browse/YARN-5950
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Hung
>
> The StoreConfigurationProvider will query the YarnConfigurationStore for 
> various configuration keys, and construct a Configuration object out of it 
> (to be passed to the scheduler, and possibly other YARN components).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059785#comment-16059785
 ] 

ASF GitHub Bot commented on YARN-6673:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/232#discussion_r123583254
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java
 ---
@@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path)
   @Override
   public List preStart(Container container)
   throws ResourceHandlerException {
-
 String cgroupId = container.getContainerId().toString();
 Resource containerResource = container.getResource();
 cGroupsHandler.createCGroup(CPU, cgroupId);
 try {
   int containerVCores = containerResource.getVirtualCores();
-  int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
-  cGroupsHandler
-  .updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
-  String.valueOf(cpuShares));
+  ContainerTokenIdentifier id = 
container.getContainerTokenIdentifier();
+  if (id != null && id.getExecutionType() ==
+  ExecutionType.OPPORTUNISTIC) {
+cGroupsHandler
+.updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
+String.valueOf(CPU_DEFAULT_WEIGHT_OPPORTUNISTIC));
+  } else {
+int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
+cGroupsHandler
+.updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
+String.valueOf(cpuShares));
+  }
   if (strictResourceUsageMode) {
--- End diff --

Yes, I think so. If the admin chooses strict cpu limits, all containers 
should get strict cpu limits based on vcores. Opportunistic ones still will be 
throttled by cpu.shares, if guaranteed are running. This is just a cap, for 
opportunistic containers with different thread counts not to affect each other 
negatively.


> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059787#comment-16059787
 ] 

ASF GitHub Bot commented on YARN-6673:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/232#discussion_r123583294
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/TestCGroupsCpuResourceHandlerImpl.java
 ---
@@ -294,4 +296,26 @@ public void testTeardown() throws Exception {
   public void testStrictResourceUsage() throws Exception {
 Assert.assertNull(cGroupsCpuResourceHandler.teardown());
   }
+
+  @Test
+  public void testOpportunistic() throws Exception {
+Configuration conf = new YarnConfiguration();
+
+String id = "container_01_01";
+ContainerId mockContainerId = mock(ContainerId.class);
+when(mockContainerId.toString()).thenReturn(id);
+
+cGroupsCpuResourceHandler.bootstrap(plugin, conf);
--- End diff --

OKay.



> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6673) Add cpu cgroup configurations for opportunistic containers

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059769#comment-16059769
 ] 

ASF GitHub Bot commented on YARN-6673:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/232#discussion_r123582374
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsCpuResourceHandlerImpl.java
 ---
@@ -181,16 +184,23 @@ public static boolean cpuLimitsExist(String path)
   @Override
   public List preStart(Container container)
   throws ResourceHandlerException {
-
 String cgroupId = container.getContainerId().toString();
 Resource containerResource = container.getResource();
 cGroupsHandler.createCGroup(CPU, cgroupId);
 try {
   int containerVCores = containerResource.getVirtualCores();
-  int cpuShares = CPU_DEFAULT_WEIGHT * containerVCores;
-  cGroupsHandler
-  .updateCGroupParam(CPU, cgroupId, 
CGroupsHandler.CGROUP_CPU_SHARES,
-  String.valueOf(cpuShares));
+  ContainerTokenIdentifier id = 
container.getContainerTokenIdentifier();
+  if (id != null && id.getExecutionType() ==
+  ExecutionType.OPPORTUNISTIC) {
--- End diff --

Hmm this is just 2 lines of code. Adding a function would be at least 5 
more lines. Ideally the container has this function but that might be too much 
churn for the interface.


> Add cpu cgroup configurations for opportunistic containers
> --
>
> Key: YARN-6673
> URL: https://issues.apache.org/jira/browse/YARN-6673
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> In addition to setting cpu.cfs_period_us on a per-container basis, we could 
> also set cpu.shares to 2 for opportunistic containers so they are run on a 
> best-effort basis



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059755#comment-16059755
 ] 

ASF GitHub Bot commented on YARN-6668:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/241#discussion_r123580271
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsResourceCalculator.java
 ---
@@ -0,0 +1,346 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.util.CpuTimeTracker;
+import org.apache.hadoop.util.Shell;
+import org.apache.hadoop.util.SysInfoLinux;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.util.Clock;
+import org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree;
+import org.apache.hadoop.yarn.util.SystemClock;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.math.BigInteger;
+import java.nio.charset.Charset;
+import java.util.function.Function;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * A cgroups file-system based Resource calculator without the process tree
+ * features.
+ */
+public class CGroupsResourceCalculator extends 
ResourceCalculatorProcessTree {
+  enum Result {
+Continue,
+Exit
+  }
+  private static final String PROCFS = "/proc";
+  static final String CGROUP = "cgroup";
+  static final String CPU_STAT = "cpuacct.stat";
+  static final String MEM_STAT = "memory.usage_in_bytes";
+  static final String MEMSW_STAT = "memory.memsw.usage_in_bytes";
+  private static final String USER = "user ";
+  private static final String SYSTEM = "system ";
+
+  private static final Pattern CGROUP_FILE_FORMAT = Pattern.compile(
+  "^(\\d+):([^:]+):/(.*)$");
+  private final String procfsDir;
+  private CGroupsHandler cGroupsHandler;
+
+  private String pid;
+  private File cpuStat;
+  private File memStat;
+  private File memswStat;
+
+  private final long jiffyLengthMs;
+  private BigInteger processTotalJiffies = BigInteger.ZERO;
+  private final CpuTimeTracker cpuTimeTracker;
+  private Clock clock;
+
+  private final static Object LOCK = new Object();
+  private static boolean firstError = true;
+
+  /**
+   * Create resource calculator for all Yarn containers.
+   * @throws YarnException Could not access cgroups
+   */
+  public CGroupsResourceCalculator() throws YarnException {
+this(null, PROCFS, ResourceHandlerModule.getCGroupsHandler(),
+SystemClock.getInstance());
+  }
+
+  /**
+   * Create resource calculator for the container that has the specified 
pid.
+   * @param pid A pid from the cgroup or null for all containers
+   * @throws YarnException Could not access cgroups
+   */
+  public CGroupsResourceCalculator(String pid) throws YarnException {
+this(pid, PROCFS, ResourceHandlerModule.getCGroupsHandler(),
+SystemClock.getInstance());
+  }
+
+  /**
+   * Create resource calculator for testing.
+   * @param pid A pid from the cgroup or null for all containers
+   * @param procfsDir Path to /proc or a mock /proc directory
+   * @param cGroupsHandler Initialized cgroups handler object
+   * @param clock A clock object
+   * @throws YarnException YarnException Could not access cgroups
+   */
+  @VisibleForTesting
+  CGroupsResourceCalculator(String pid, String procfsDir,
+ 

[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059742#comment-16059742
 ] 

ASF GitHub Bot commented on YARN-6668:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/241#discussion_r123577788
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/ContainersMonitorImpl.java
 ---
@@ -588,6 +589,31 @@ private void initializeProcessTrees(
 }
 
 /**
+ * Get the best process tree calculator.
+ * @param pId container process id
+ * @return process tree calculator
+ */
+private ResourceCalculatorProcessTree
+getResourceCalculatorProcessTree(String pId) {
+  ResourceCalculatorProcessTree pt = null;
+
+  // CGroups is best in perforance, so try to use it, if it is enabled
+  if (processTreeClass == null &&
--- End diff --

Impossible. CGroupsResourceCalculator relies on cgroups that are present 
only in node manager. ResourceCalculatorProcessTree is hadoop-yarn.common 
package


> Use cgroup to get container resource utilization
> 
>
> Key: YARN-6668
> URL: https://issues.apache.org/jira/browse/YARN-6668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> Container Monitor relies on proc file system to get container resource 
> utilization, which is not as efficient as reading cgroup accounting. We 
> should in NM, when cgroup is enabled, read cgroup stats instead. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6668) Use cgroup to get container resource utilization

2017-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059737#comment-16059737
 ] 

ASF GitHub Bot commented on YARN-6668:
--

Github user szegedim commented on a diff in the pull request:

https://github.com/apache/hadoop/pull/241#discussion_r123577119
  
--- Diff: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsResourceCalculator.java
 ---
@@ -0,0 +1,292 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources;
+
+import org.apache.hadoop.util.CpuTimeTracker;
+import org.apache.hadoop.util.Shell;
+import org.apache.hadoop.util.SysInfoLinux;
+import org.apache.hadoop.yarn.exceptions.YarnException;
+import org.apache.hadoop.yarn.util.Clock;
+import org.apache.hadoop.yarn.util.ResourceCalculatorProcessTree;
+import org.apache.hadoop.yarn.util.SystemClock;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.math.BigInteger;
+import java.nio.charset.Charset;
+import java.util.function.Function;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * A cgroups file-system based Resource calculator without the process tree
+ * features.
+ */
+
+public class CGroupsResourceCalculator extends 
ResourceCalculatorProcessTree {
--- End diff --

It supports aggregated container utilization by giving a null pid


> Use cgroup to get container resource utilization
> 
>
> Key: YARN-6668
> URL: https://issues.apache.org/jira/browse/YARN-6668
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>
> Container Monitor relies on proc file system to get container resource 
> utilization, which is not as efficient as reading cgroup accounting. We 
> should in NM, when cgroup is enabled, read cgroup stats instead. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6708) Nodemanager container crash after ext3 folder limit

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059711#comment-16059711
 ] 

Hadoop QA commented on YARN-6708:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
47s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 in trunk has 5 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
6s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 49s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 10 new + 135 unchanged - 5 fixed = 145 total (was 140) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
33s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 0 new + 226 unchanged - 1 fixed = 226 total (was 227) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
34s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
20s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6708 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874105/YARN-6708.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7558ca6f45cf 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 8dbd53e |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16223/artifact/patchprocess/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
 |
| checkstyle | 

[jira] [Commented] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-06-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059710#comment-16059710
 ] 

Sunil G commented on YARN-6714:
---

[~Tao Yang]
I think your fix is correct. However could you add some more comments in code 
for future reference. It could help to know in which cases running attemptId 
may mismatch with app.getApplicationAttemptId().

> RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED 
> event when async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059706#comment-16059706
 ] 

Sunil G commented on YARN-6678:
---

Thanks [~Tao Yang]. Nice catch.

Few comments:
# In {{FiCaSchedulerApp#accept}}, its  better to use {{RMContainer#equals}} 
instead of using *!=*
# In same method's debug log, please add nodeId to log as well.
# In testcase named {{testCommitOutdatedReservedProposal}}, you can call 
{{rm.stop()}} at the end
# In same test case
{code}
222 while (true) {
223   if (sn1.getReservedContainer() != null) {
224 break;
225   }
226   Thread.sleep(100);
227 }
{code}
I prefer to reduce timeout and do certain retries. Though test case has a 
timeout at top level, its better to make this 10ms and to 20 or 50 retries.

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5876) TestResourceTrackerService#testGracefulDecommissionWithApp fails intermittently on trunk

2017-06-22 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059668#comment-16059668
 ] 

Robert Kanter commented on YARN-5876:
-

Test failures unrelated - they fail without the patch too.

> TestResourceTrackerService#testGracefulDecommissionWithApp fails 
> intermittently on trunk
> 
>
> Key: YARN-5876
> URL: https://issues.apache.org/jira/browse/YARN-5876
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Varun Saxena
>Assignee: Robert Kanter
> Attachments: YARN-5876.001.patch
>
>
> {noformat}
> java.lang.AssertionError: node shouldn't be null
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertNotNull(Assert.java:621)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.MockRM.waitForState(MockRM.java:750)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestResourceTrackerService.testGracefulDecommissionWithApp(TestResourceTrackerService.java:318)
> {noformat}
> Refer to https://builds.apache.org/job/PreCommit-YARN-Build/13884/testReport/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6716) Native services support for specifying component start order

2017-06-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059659#comment-16059659
 ] 

Jian He commented on YARN-6716:
---

when we assign priority key to components, I think we may also consider the 
dependency order here, the dependent-upon component should have higher 
priorities so that its containers get the higher chance to be allocated. 
Basically, we may sort the components in dependency order in below code ? 

{code}
for (Component component : app.getComponents()) {
  priority = getNewPriority(priority);
  String name = component.getName();
  if (roles.containsKey(name)) {
continue;
  }
  log.info("Adding component: " + name);
  createComponent(name, component, priority++);
}
{code}

> Native services support for specifying component start order
> 
>
> Key: YARN-6716
> URL: https://issues.apache.org/jira/browse/YARN-6716
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Attachments: YARN-6716-yarn-native-services.001.patch, 
> YARN-6716-yarn-native-services.002.patch, 
> YARN-6716-yarn-native-services.003.patch
>
>
> Some native services apps have components that should be started after other 
> components. The readiness_check and dependencies features of the native 
> services API are currently unimplemented, and we could use these to implement 
> a basic start order feature. When component B has a dependency on component 
> A, the AM could delay making a container request for component B until 
> component A's readiness check has passed (for all instances of component A).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6708) Nodemanager container crash after ext3 folder limit

2017-06-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-6708:
---
Attachment: YARN-6708.003.patch

Attaching patch after following modification
# Testcase addition
# findbug correction
# whitespace correction

Please review patch attached

> Nodemanager container crash after ext3 folder limit
> ---
>
> Key: YARN-6708
> URL: https://issues.apache.org/jira/browse/YARN-6708
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-6708.001.patch, YARN-6708.002.patch, 
> YARN-6708.003.patch
>
>
> Configure umask as *027* for nodemanager service user
> and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After 
> 4  *private* dir localization next directory will be *0/14*
> Local Directory cache manager 
> {code}
> vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l
> total 28
> drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./
> drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../
> drwxr-x--- 3 mapred users  4096 Jun 10 14:36 0/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:15 10/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:22 11/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:27 12/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:31 13/
> {code}
> *drwxr-x---* 3 mapred users  4096 Jun 10 14:36 0/ is only *750*
> Nodemanager user will not be able check for localization path exists or not.
> {{LocalResourcesTrackerImpl}}
> {code}
> case REQUEST:
>   if (rsrc != null && (!isResourcePresent(rsrc))) {
> LOG.info("Resource " + rsrc.getLocalPath()
> + " is missing, localizing it again");
> removeResource(req);
> rsrc = null;
>   }
>   if (null == rsrc) {
> rsrc = new LocalizedResource(req, dispatcher);
> localrsrc.put(req, rsrc);
>   }
>   break;
> {code}
> *isResourcePresent* will always return false and same resource will be 
> localized to {{0}} to next unique number



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6708) Nodemanager container crash after ext3 folder limit

2017-06-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt reassigned YARN-6708:
--

Assignee: Bibin A Chundatt

> Nodemanager container crash after ext3 folder limit
> ---
>
> Key: YARN-6708
> URL: https://issues.apache.org/jira/browse/YARN-6708
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Critical
> Attachments: YARN-6708.001.patch, YARN-6708.002.patch, 
> YARN-6708.003.patch
>
>
> Configure umask as *027* for nodemanager service user
> and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After 
> 4  *private* dir localization next directory will be *0/14*
> Local Directory cache manager 
> {code}
> vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l
> total 28
> drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./
> drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../
> drwxr-x--- 3 mapred users  4096 Jun 10 14:36 0/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:15 10/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:22 11/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:27 12/
> drwxr-xr-x 3 mapred users  4096 Jun 10 12:31 13/
> {code}
> *drwxr-x---* 3 mapred users  4096 Jun 10 14:36 0/ is only *750*
> Nodemanager user will not be able check for localization path exists or not.
> {{LocalResourcesTrackerImpl}}
> {code}
> case REQUEST:
>   if (rsrc != null && (!isResourcePresent(rsrc))) {
> LOG.info("Resource " + rsrc.getLocalPath()
> + " is missing, localizing it again");
> removeResource(req);
> rsrc = null;
>   }
>   if (null == rsrc) {
> rsrc = new LocalizedResource(req, dispatcher);
> localrsrc.put(req, rsrc);
>   }
>   break;
> {code}
> *isResourcePresent* will always return false and same resource will be 
> localized to {{0}} to next unique number



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059298#comment-16059298
 ] 

Hadoop QA commented on YARN-6678:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 30s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 64m 52s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAsyncScheduling
 |
|   | hadoop.yarn.server.resourcemanager.TestRMRestart |
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6678 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874072/YARN-6678.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux bbaa151c606c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / b649519 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16222/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16222/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16222/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> 

[jira] [Updated] (YARN-6678) Committer thread crashes with IllegalStateException in async-scheduling mode of CapacityScheduler

2017-06-22 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6678:
---
Attachment: YARN-6678.003.patch

Updated the patch without adding new method to CapacityScheduler. Thanks 
[~leftnoteasy] for your suggestion, it's fine to only change the spy target for 
the test case.

> Committer thread crashes with IllegalStateException in async-scheduling mode 
> of CapacityScheduler
> -
>
> Key: YARN-6678
> URL: https://issues.apache.org/jira/browse/YARN-6678
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6678.001.patch, YARN-6678.002.patch, 
> YARN-6678.003.patch
>
>
> Error log:
> {noformat}
> java.lang.IllegalStateException: Trying to reserve container 
> container_e10_1495599791406_7129_01_001453 for application 
> appattempt_1495599791406_7129_01 when currently reserved container 
> container_e10_1495599791406_7123_01_001513 on node host: node0123:45454 
> #containers=40 available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.reserveResource(FiCaSchedulerNode.java:81)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.reserve(FiCaSchedulerApp.java:1079)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.apply(FiCaSchedulerApp.java:795)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.tryCommit(CapacityScheduler.java:2770)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$ResourceCommitterService.run(CapacityScheduler.java:546)
> {noformat}
> Reproduce this problem:
> 1. nm1 re-reserved app-1/container-X1 and generated reserve proposal-1
> 2. nm2 had enough resource for app-1, un-reserved app-1/container-X1 and 
> allocated app-1/container-X2
> 3. nm1 reserved app-2/container-Y
> 4. proposal-1 was accepted but throw IllegalStateException when applying
> Currently the check code for reserve proposal in FiCaSchedulerApp#accept as 
> follows:
> {code}
>   // Container reserved first time will be NEW, after the container
>   // accepted & confirmed, it will become RESERVED state
>   if (schedulerContainer.getRmContainer().getState()
>   == RMContainerState.RESERVED) {
> // Set reReservation == true
> reReservation = true;
>   } else {
> // When reserve a resource (state == NEW is for new container,
> // state == RUNNING is for increase container).
> // Just check if the node is not already reserved by someone
> if (schedulerContainer.getSchedulerNode().getReservedContainer()
> != null) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Try to reserve a container, but the node is "
> + "already reserved by another container="
> + schedulerContainer.getSchedulerNode()
> .getReservedContainer().getContainerId());
>   }
>   return false;
> }
>   }
> {code}
> The reserved container on the node of reserve proposal will be checked only 
> for first-reserve container.
> We should confirm that reserved container on this node is equal to re-reserve 
> container.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6714) RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED event when async-scheduling enabled in CapacityScheduler

2017-06-22 Thread Tao Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-6714:
---
Attachment: YARN-6714.002.patch

Updated the patch with moving test case to TestCapacitySchedulerAsyncScheduling.

> RM crashed with IllegalStateException while handling APP_ATTEMPT_REMOVED 
> event when async-scheduling enabled in CapacityScheduler
> -
>
> Key: YARN-6714
> URL: https://issues.apache.org/jira/browse/YARN-6714
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha3
>Reporter: Tao Yang
>Assignee: Tao Yang
> Attachments: YARN-6714.001.patch, YARN-6714.002.patch
>
>
> Currently in async-scheduling mode of CapacityScheduler, after AM failover 
> and unreserve all reserved containers, it still have chance to get and commit 
> the outdated reserve proposal of the failed app attempt. This problem 
> happened on an app in our cluster, when this app stopped, it unreserved all 
> reserved containers and compared these appAttemptId with current 
> appAttemptId, if not match it will throw IllegalStateException and make RM 
> crashed.
> Error log:
> {noformat}
> 2017-06-08 11:02:24,339 FATAL [ResourceManager Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
> handling event type APP_ATTEMPT_REMOVED to the scheduler
> java.lang.IllegalStateException: Trying to unreserve  for application 
> appattempt_1495188831758_0121_02 when currently reserved  for application 
> application_1495188831758_0121 on node host: node1:45454 #containers=2 
> available=... used=...
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerNode.unreserveResource(FiCaSchedulerNode.java:123)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp.unreserve(FiCaSchedulerApp.java:845)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.completedContainer(LeafQueue.java:1787)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.completedContainerInternal(CapacityScheduler.java:1957)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.completedContainer(AbstractYarnScheduler.java:586)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.doneApplicationAttempt(CapacityScheduler.java:966)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1740)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:152)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:822)
> at java.lang.Thread.run(Thread.java:834)
> {noformat}
> When async-scheduling enabled, CapacityScheduler#doneApplicationAttempt and 
> CapacityScheduler#tryCommit both need to get write_lock before executing, so 
> we can check the app attempt state in commit process to avoid committing 
> outdated proposals.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059206#comment-16059206
 ] 

Hadoop QA commented on YARN-5648:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m  
7s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 1s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
34s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} YARN-5355 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
3s{color} | {color:green} YARN-5355 passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
7s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in 
YARN-5355 has 2 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} YARN-5355 passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
25s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  6m 41s{color} 
| {color:red} hadoop-yarn-server-tests in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.server.TestMiniYarnClusterNodeUtilization |
|   | hadoop.yarn.server.TestContainerManagerSecurity |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ac17dc |
| JIRA Issue | YARN-5648 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874050/YARN-5648-YARN-5355.04.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  xml  |
| uname | Linux 136823d8d1d0 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | YARN-5355 / 0763450 |
| 

[jira] [Updated] (YARN-5076) YARN web interfaces lack XFS protection

2017-06-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-5076:

Labels: security  (was: )

> YARN web interfaces lack XFS protection
> ---
>
> Key: YARN-5076
> URL: https://issues.apache.org/jira/browse/YARN-5076
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager, timelineserver
>Reporter: Jonathan Maron
>Assignee: Jonathan Maron
>  Labels: security
> Fix For: 2.9.0, 3.0.0-alpha1
>
> Attachments: YARN-5076.002.patch, YARN-5076.003.patch, 
> YARN-5076.004.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> There are web interfaces in YARN that do not provide protection against cross 
> frame scripting 
> (https://www.owasp.org/index.php/Clickjacking_Defense_Cheat_Sheet).  
> HADOOP-13008 provides a common filter for addressing this vulnerability, so 
> this filter should be integrated into the YARN web interfaces.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5648:
---
Attachment: YARN-5648-YARN-5355.04.patch

> [ATSv2 Security] Client side changes for authentication
> ---
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, 
> YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5648:
---
Attachment: (was: YARN-5648-YARN-5355.04.patch)

> [ATSv2 Security] Client side changes for authentication
> ---
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, 
> YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-5648:
---
Attachment: YARN-5648-YARN-5355.04.patch

> [ATSv2 Security] Client side changes for authentication
> ---
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.04.patch, 
> YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16059056#comment-16059056
 ] 

zhengchenyu commented on YARN-6728:
---

Our partner [~maobaolong] advice to remove verifyAndCreateRemoteLogDir and 
mkdir to daemon thread. By this way container will not  be stuck by defaultFs

> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
the defaultFs. So the container will be stuck here. Then application will run 
slow.

  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
stuck here. Then application will run slow.


> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log, these operation will visit 
> the defaultFs. So the container will be stuck here. Then application will run 
> slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase.  (Note: 
log-aggregation is enable. )

Container runs in nodemanager will invoke initApp(), then invoke 
verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
stuck here. Then application will run slow.

  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation 
is enable. )




> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase.  (Note: 
> log-aggregation is enable. )
> Container runs in nodemanager will invoke initApp(), then invoke 
> verifyAndCreateRemoteLogDir and mkdir remote log. So the container will be 
> stuck here. Then application will run slow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: log-aggregation 
is enable. )



  was:
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: we )




> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase. (Note: 
> log-aggregation is enable. )



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated YARN-6728:
--
Description: 
In our cluster, I found many map keep "NEW" state  for several minutes. Here I 
got the container log: 
{code}
[2017-06-13T18:21:23.068+08:00] [INFO] 
containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
304) [AsyncDispatcher event handler] : Adding 
container_1495632926847_2459604_01_11 to application 
application_1495632926847_2459604
[2017-06-13T18:23:08.715+08:00] [INFO] 
containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
[AsyncDispatcher event handler] : Container 
container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
{code}

Then I search the log from 18:21:23.068 to 18:23:08.715. I found some dispatch 
of  AsyncDispather run slow, because they visit the defaultFs. Our cluster 
increase to 4k node, the pressure of defaultFs increase. (Note: we )



  was:Job will run slow when the performance of defaultFs degrades and the 
log-aggregation is enable. 


> Job will run slow when the performance of defaultFs degrades and the 
> log-aggregation is enable. 
> 
>
> Key: YARN-6728
> URL: https://issues.apache.org/jira/browse/YARN-6728
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Affects Versions: 2.7.1
> Environment: CentOS 7.1 hadoop-2.7.1
>Reporter: zhengchenyu
> Fix For: 2.9.0, 2.7.4
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> In our cluster, I found many map keep "NEW" state  for several minutes. Here 
> I got the container log: 
> {code}
> [2017-06-13T18:21:23.068+08:00] [INFO] 
> containermanager.application.ApplicationImpl.transition(ApplicationImpl.java 
> 304) [AsyncDispatcher event handler] : Adding 
> container_1495632926847_2459604_01_11 to application 
> application_1495632926847_2459604
> [2017-06-13T18:23:08.715+08:00] [INFO] 
> containermanager.container.ContainerImpl.handle(ContainerImpl.java 1137) 
> [AsyncDispatcher event handler] : Container 
> container_1495632926847_2459604_01_11 transitioned from NEW to LOCALIZING
> {code}
> Then I search the log from 18:21:23.068 to 18:23:08.715. I found some 
> dispatch of  AsyncDispather run slow, because they visit the defaultFs. Our 
> cluster increase to 4k node, the pressure of defaultFs increase. (Note: we )



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6728) Job will run slow when the performance of defaultFs degrades and the log-aggregation is enable.

2017-06-22 Thread zhengchenyu (JIRA)
zhengchenyu created YARN-6728:
-

 Summary: Job will run slow when the performance of defaultFs 
degrades and the log-aggregation is enable. 
 Key: YARN-6728
 URL: https://issues.apache.org/jira/browse/YARN-6728
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager, yarn
Affects Versions: 2.7.1
 Environment: CentOS 7.1 hadoop-2.7.1
Reporter: zhengchenyu
 Fix For: 2.9.0, 2.7.4


Job will run slow when the performance of defaultFs degrades and the 
log-aggregation is enable. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

2017-06-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058972#comment-16058972
 ] 

Hadoop QA commented on YARN-2919:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
22s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
40s{color} | {color:red} hadoop-common-project/hadoop-common in trunk has 17 
extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m  
3s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  5s{color} | {color:orange} root: The patch generated 9 new + 101 unchanged 
- 0 fixed = 110 total (was 101) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m  7s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 49s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
38s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}142m 23s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.ha.TestActiveStandbyElectorRealZK |
|   | 
hadoop.yarn.server.resourcemanager.TestOpportunisticContainerAllocatorAMService 
|
|   | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer |
| Timed out junit tests | 
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-2919 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12874013/YARN-2919.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 80a3a2a3ab73 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 9ae9467 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | 
https://builds.apache.org/job/PreCommit-YARN-Build/16220/artifact/patchprocess/branch-findbugs-hadoop-common-project_hadoop-common-warnings.html
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/16220/artifact/patchprocess/diff-checkstyle-root.txt
 |
| unit | 

[jira] [Commented] (YARN-6727) Improve getQueueUserAcls API to query for specific queue and user

2017-06-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058945#comment-16058945
 ] 

Naganarasimha G R commented on YARN-6727:
-

Thanks [~bibinchundatt] for raising this issue. Agree that it can be simplified 
in the server side it gets simplified.

> Improve getQueueUserAcls API to query for  specific queue and user
> --
>
> Key: YARN-6727
> URL: https://issues.apache.org/jira/browse/YARN-6727
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all 
> the queues available in scheduler for user.
> User wants to know whether he has rights of a particular queue only. For 
> systems with 5K queues returning all queues list is not efficient.
> Suggested change: support additional parameters *userName and queueName* as 
> optional. Admin user should be able to query other users ACL for a particular 
> queueName.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6727) Improve getQueueUserAcls API to query for specific queue and user

2017-06-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-6727:
---
Summary: Improve getQueueUserAcls API to query for  specific queue and user 
 (was: Improve getQueueUserAcls API to queery for  specific queue and user)

> Improve getQueueUserAcls API to query for  specific queue and user
> --
>
> Key: YARN-6727
> URL: https://issues.apache.org/jira/browse/YARN-6727
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all 
> the queues available in scheduler for user.
> User wants to know whether he has rights of a particular queue only. For 
> systems with 5K queues returning all queues list is not efficient.
> Suggested change: support additional parameters *userName and queueName* as 
> optional. Admin user should be able to query other users ACL for a particular 
> queueName.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6727) Improve getQueueUserAcls API to queery for specific queue and user

2017-06-22 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-6727:
--

 Summary: Improve getQueueUserAcls API to queery for  specific 
queue and user
 Key: YARN-6727
 URL: https://issues.apache.org/jira/browse/YARN-6727
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Currently {{ApplicationClientProtocol#getQueueUserAcls}} return data for all 
the queues available in scheduler for user.

User wants to know whether he has rights of a particular queue only. For 
systems with 5K queues returning all queues list is not efficient.

Suggested change: support additional parameters *userName and queueName* as 
optional. Admin user should be able to query other users ACL for a particular 
queueName.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5648) [ATSv2 Security] Client side changes for authentication

2017-06-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058839#comment-16058839
 ] 

Varun Saxena commented on YARN-5648:


Checkstyle failure is related. Will update a patch.

> [ATSv2 Security] Client side changes for authentication
> ---
>
> Key: YARN-5648
> URL: https://issues.apache.org/jira/browse/YARN-5648
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-5355-merge-blocker
> Attachments: YARN-5648-YARN-5355.02.patch, 
> YARN-5648-YARN-5355.03.patch, YARN-5648-YARN-5355.wip.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org