[jira] [Commented] (YARN-8266) [UI2] Clicking on application from cluster view should redirect to application attempt page

2018-05-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475368#comment-16475368
 ] 

Hudson commented on YARN-8266:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14197 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14197/])
YARN-8266. [UI2] Clicking on application from cluster view should (sunilg: rev 
796b2b0ee36e8e9225fb76ae35edc58ad907b737)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/utils/href-address-utils.js


> [UI2] Clicking on application from cluster view should redirect to 
> application attempt page
> ---
>
> Key: YARN-8266
> URL: https://issues.apache.org/jira/browse/YARN-8266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8266.001.patch
>
>
> Steps:
> 1) Start one application
>  2) Go to cluster overview page
>  3) Click on applicationId from Cluster Resource Usage By Application
> This action redirects to 
> [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is 
> invalid url. It does not show any details.
> Instead It should redirect to attempt page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475361#comment-16475361
 ] 

Naganarasimha G R commented on YARN-8289:
-

[~sunil.gov...@gmail.com],  Given that you had already working on the scheduler 
patch and testing it already with DS shell, would you like to take this Jira 
up? 

> Modify distributedshell to support Node Attributes
> --
>
> Key: YARN-8289
> URL: https://issues.apache.org/jira/browse/YARN-8289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: YARN-3409
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Modifications required in Distributed shell to support NodeAttributes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8236) Invalid kerberos principal file name cause NPE in native service

2018-05-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475358#comment-16475358
 ] 

Sunil G commented on YARN-8236:
---

+1. Committing shortly.

> Invalid kerberos principal file name cause NPE in native service
> 
>
> Key: YARN-8236
> URL: https://issues.apache.org/jira/browse/YARN-8236
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Sunil G
>Assignee: Gour Saha
>Priority: Critical
> Attachments: YARN-8236.01.patch, YARN-8236.02.patch
>
>
> Stack trace
>  
> {code:java}
> 2018-04-29 16:22:54,266 WARN webapp.GenericExceptionHandler 
> (GenericExceptionHandler.java:toResponse(98)) - INTERNAL_SERVER_ERROR
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.addKeytabResourceIfSecure(ServiceClient.java:994)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.submitApp(ServiceClient.java:685)
> at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionCreate(ServiceClient.java:269){code}
> cc [~gsaha] [~csingh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8278) DistributedScheduling not working in HA

2018-05-14 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475349#comment-16475349
 ] 

Bibin A Chundatt commented on YARN-8278:


[~cheersyang]

Thank you for review. Attached patch handling comments

> DistributedScheduling not working in HA
> ---
>
> Key: YARN-8278
> URL: https://issues.apache.org/jira/browse/YARN-8278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8278.001.patch, YARN-8278.002.patch
>
>
> Configured HA Cluster and submit application with distributed scheduling 
> configuration enabled
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException):
>  java.lang.IllegalArgumentException: ResourceManager does not support this 
> protocol
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101)
> at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> {{ServerRMProxy#checkAllowedProtocols}} should allow 
> {{DistributedSchedulingAMProtocol}}
> {code}
>   public void checkAllowedProtocols(Class protocol) {
> Preconditions.checkArgument(
> protocol.isAssignableFrom(ResourceTracker.class),
> "ResourceManager does not support this protocol");
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8278) DistributedScheduling not working in HA

2018-05-14 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-8278:
---
Attachment: YARN-8278.002.patch

> DistributedScheduling not working in HA
> ---
>
> Key: YARN-8278
> URL: https://issues.apache.org/jira/browse/YARN-8278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8278.001.patch, YARN-8278.002.patch
>
>
> Configured HA Cluster and submit application with distributed scheduling 
> configuration enabled
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException):
>  java.lang.IllegalArgumentException: ResourceManager does not support this 
> protocol
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101)
> at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> {{ServerRMProxy#checkAllowedProtocols}} should allow 
> {{DistributedSchedulingAMProtocol}}
> {code}
>   public void checkAllowedProtocols(Class protocol) {
> Preconditions.checkArgument(
> protocol.isAssignableFrom(ResourceTracker.class),
> "ResourceManager does not support this protocol");
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8166) [UI2] Service page header links are broken

2018-05-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8166:
--
Summary: [UI2] Service page header links are broken  (was: Service AppId 
page throws HTTP Error 401)

> [UI2] Service page header links are broken
> --
>
> Key: YARN-8166
> URL: https://issues.apache.org/jira/browse/YARN-8166
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8166.001.patch
>
>
> Steps:
> 1) Launch a yarn service in unsecure cluster
> 2) Go to component info page for sleeper-0
> 3) click on sleeper link
> http://xxx:8088/ui2/#/yarn-component-instances/sleeper/components?service=yesha-sleeper&&appid=application_1518804855867_0002
> Above url fails with HTTP Error 401
>  {code}
> 401, Authorization required.
> Please check your security settings.
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475340#comment-16475340
 ] 

Rohith Sharma K S commented on YARN-8130:
-

thanks to [~haibochen] and [~vrushalic] for the review. I back-ported to 
branch-3.1/branch-3.0/branch-2 as well. 

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-14 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-8130:

Fix Version/s: 3.0.3
   3.1.1
   2.10.0

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 2.10.0, 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8266) [UI2] Clicking on application from cluster view should redirect to application attempt page

2018-05-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8266:
--
Summary: [UI2] Clicking on application from cluster view should redirect to 
application attempt page  (was: Clicking on application from cluster view 
should redirect to application attempt page)

> [UI2] Clicking on application from cluster view should redirect to 
> application attempt page
> ---
>
> Key: YARN-8266
> URL: https://issues.apache.org/jira/browse/YARN-8266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8266.001.patch
>
>
> Steps:
> 1) Start one application
>  2) Go to cluster overview page
>  3) Click on applicationId from Cluster Resource Usage By Application
> This action redirects to 
> [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is 
> invalid url. It does not show any details.
> Instead It should redirect to attempt page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8292) Preemption of GPU resource does not happen if memory/vcores is not required to be preempted

2018-05-14 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-8292:


 Summary: Preemption of GPU resource does not happen if 
memory/vcores is not required to be preempted
 Key: YARN-8292
 URL: https://issues.apache.org/jira/browse/YARN-8292
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Sumana Sathish
Assignee: Tan, Wangda






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8278) DistributedScheduling not working in HA

2018-05-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475303#comment-16475303
 ] 

Weiwei Yang commented on YARN-8278:
---

Hi [~bibinchundatt]

Thanks for the patch, please see my comments below

1. ServerRMProxy is used by NMs to talk with RM, and since LocalRM running on 
NMs need to talk to RM through {{DistributedSchedulingAMProtocol}}, it makes 
sense to allow this protocol in the proxy. However, can we follow the fashion 
used by {{ClientRMProxy}} class for the check, by adding an interface 
{{ServerRMProtocols}} just like following {{ClientRMProtocols}}
{code:java}
private interface ClientRMProtocols extends ApplicationClientProtocol, 
ApplicationMasterProtocol, ResourceManagerAdministrationProtocol {
  // Add nothing
}

Preconditions.checkArgument(protocol.isAssignableFrom(ClientRMProtocols.class), 
"RM does not support this client protocol");
{code}
2. {{TestServerRMProxy}}, line 31: throws IOException can be removed; line 33, 
defaultRMAddress can be removed.

3. Can you please fix the checkstyle issue

Thanks

> DistributedScheduling not working in HA
> ---
>
> Key: YARN-8278
> URL: https://issues.apache.org/jira/browse/YARN-8278
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: YARN-8278.001.patch
>
>
> Configured HA Cluster and submit application with distributed scheduling 
> configuration enabled
> {code}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.yarn.exceptions.YarnRuntimeException):
>  java.lang.IllegalArgumentException: ResourceManager does not support this 
> protocol
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor.init(DefaultRequestInterceptor.java:91)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AbstractRequestInterceptor.init(AbstractRequestInterceptor.java:82)
> at 
> org.apache.hadoop.yarn.server.nodemanager.scheduler.DistributedScheduler.init(DistributedScheduler.java:89)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.initializePipeline(AMRMProxyService.java:450)
> at 
> org.apache.hadoop.yarn.server.nodemanager.amrmproxy.AMRMProxyService.processApplicationStartRequest(AMRMProxyService.java:369)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:942)
> at 
> org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:101)
> at 
> org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:223)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> {code}
> {{ServerRMProxy#checkAllowedProtocols}} should allow 
> {{DistributedSchedulingAMProtocol}}
> {code}
>   public void checkAllowedProtocols(Class protocol) {
> Preconditions.checkArgument(
> protocol.isAssignableFrom(ResourceTracker.class),
> "ResourceManager does not support this protocol");
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8291) RMRegistryOperationService don't have limit on AsyncPurge threads

2018-05-14 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-8291:
---

 Summary: RMRegistryOperationService don't have limit on AsyncPurge 
threads
 Key: YARN-8291
 URL: https://issues.apache.org/jira/browse/YARN-8291
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


When there are more than 1+ containers finished - 
RMRegistryOperationService will create 1+ threads for performing AsyncPurge 
which can slowdown the ResourceManager process. There should be a limit on the 
number of threads.

{code}
"RegistryAdminService 554485" #824351 prio=5 os_prio=0 tid=0x7fe4b2bc9800 
nid=0xf8ed in Object.wait() [0x7fe31a5e4000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1386)
- locked <0x0007902ec7d8> (a org.apache.zookeeper.ClientCnxn$Packet)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172)
at 
org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
at 
org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:158)
at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148)
at 
org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36)
at 
org.apache.hadoop.registry.client.impl.zk.CuratorService.zkStat(CuratorService.java:455)
at 
org.apache.hadoop.registry.client.impl.zk.RegistryOperationsService.stat(RegistryOperationsService.java:137)
at 
org.apache.hadoop.registry.client.binding.RegistryUtils.statChildren(RegistryUtils.java:210)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:450)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService.purge(RegistryAdminService.java:520)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:570)
at 
org.apache.hadoop.registry.server.services.RegistryAdminService$AsyncPurge.call(RegistryAdminService.java:543)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4353) Provide short circuit user group mapping for NM/AM

2018-05-14 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475269#comment-16475269
 ] 

Wilfred Spiegelenburg commented on YARN-4353:
-

[~templedf] it has been really quiet on this Jira for a long time. I have 
recently started to run into a similar issue as described here.
Ii dug a bit deeper and found that the group lookup is used especially in the 
ACL checks. This is taken from a NM log:
{code}
2018-03-09 19:14:50,881 DEBUG org.apache.hadoop.yarn.webapp.View: Rendering 
class 
org.apache.hadoop.yarn.server.nodemanager.webapp.ContainerLogsPage$ContainersLogsBlock
 @5
2018-03-09 19:14:50,882 DEBUG 
org.apache.hadoop.yarn.server.security.ApplicationACLsManager: Verifying 
access-type VIEW_APP for wilfred (auth:SIMPLE) on application 
application_1520622831944_0001 owned by systest
2018-03-09 19:14:50,888 DEBUG org.mortbay.log: loaded class 
com.sun.jndi.ldap.LdapCtxFactory from null
...
2018-03-09 19:14:51,163 WARN org.apache.hadoop.security.LdapGroupsMapping: 
Exception trying to get groups for user wilfred: [LDAP: error code 1 - 
04DC: LdapErr: DSID-0C09075A, comment: In order to perform this operation a 
successful bind must be completed on the connection., data 0, v1db1^@]
2018-03-09 19:14:51,164 WARN org.apache.hadoop.security.UserGroupInformation: 
No groups available for user wilfred
{code}

The group resolution is triggered when you set an ACL which has groups listed 
as allowed. The lookup will be triggered if the user that is requesting access 
is not the application owner, an admin or allowed access as the user.

Using the {{NullGroupMapping}} would break ACLs.

The other proposed solution to pass in the resolved groups to the AM is also 
not scalable. In the case that there are thousands of users in the LDAP server 
and hundreds of groups you would add a large overhead to the NM and then to the 
AM. You would also get into trouble with long running applications. The group 
data would become stale and thus cause a security issue.
The AM also uses it for the RPC protocol ACLs if you have that configured so 
again a {{NullGroupMapping}} would break ACLs there too.

I propose to close this as a won't fix. If you want to use the 
{{LdapGroupsMapping}} you need to set the configuration up in the correct way 
to use it.
 

> Provide short circuit user group mapping for NM/AM
> --
>
> Key: YARN-4353
> URL: https://issues.apache.org/jira/browse/YARN-4353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-4353.prelim.patch
>
>
> When the NM launches an AM, the {{ContainerLocalizer}} gets the current user 
> from {{UserGroupInformation}}, which triggers user group mapping, even though 
> the user groups are never accessed.  If secure LDAP is configured for group 
> mapping, then there are some additional complications created by the 
> unnecessary group resolution.  Additionally, it adds unnecessary latency to 
> the container launch time.
> To address the issue, before getting the current user, the 
> {{ContainerLocalizer}} should configure {{UserGroupInformation}} with a null 
> group mapping service that quickly and quietly returns an empty group list 
> for all users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475264#comment-16475264
 ] 

Naganarasimha G R commented on YARN-8289:
-

Hi [~cheersyang], I have not completely thought through but felt this is kind 
of required for demo and kind of similar to yours YARN-7745. But would start 
once atleast YARN-7863  WIP is available. 

> Modify distributedshell to support Node Attributes
> --
>
> Key: YARN-8289
> URL: https://issues.apache.org/jira/browse/YARN-8289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: YARN-3409
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Modifications required in Distributed shell to support NodeAttributes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475254#comment-16475254
 ] 

genericqa commented on YARN-8080:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
10s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m  
2s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m  5s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 74 new + 121 unchanged - 3 fixed = 195 total (was 124) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 27s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
59s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
16s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 81m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8080 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attac

[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes

2018-05-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475244#comment-16475244
 ] 

Weiwei Yang commented on YARN-8289:
---

Hi [~Naganarasimha]

Thanks for creating this JIRA, how do you plan to support this? Will this be 
specified as part of {{-placement_spec}} argument?

> Modify distributedshell to support Node Attributes
> --
>
> Key: YARN-8289
> URL: https://issues.apache.org/jira/browse/YARN-8289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: YARN-3409
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Modifications required in Distributed shell to support NodeAttributes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475243#comment-16475243
 ] 

genericqa commented on YARN-8080:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
6s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 27s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
49s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 70 new + 121 unchanged - 2 fixed = 191 total (was 123) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 11s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s{color} 
| {color:red} hadoop-yarn-services-core in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
37s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
36s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.service.component.TestComponent |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JI

[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8288:
--
Fix Version/s: 3.2.0

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475239#comment-16475239
 ] 

Weiwei Yang commented on YARN-8288:
---

Thanks [~Naganarasimha]!

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.014.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475188#comment-16475188
 ] 

Suma Shivaprasad edited comment on YARN-8080 at 5/15/18 2:09 AM:
-

Attached patch with fixes to 

1. Request containers according to restart policy and not every time containers 
exit as was happening earlier.
2. checkAndUpdateServiceState was not getting called in case of terminating 
components which was causing Service state to be always STARTED instead of 
STABLE.
3. Fixed a failing UT


was (Author: suma.shivaprasad):
Attached patch with fixes to 

1. Request containers according to restart policy and not every time containers 
exit as was happening earlier.
2. checkAndUpdateServiceState was not getting called in case of terminating 
components which was causing Service state to be always STARTED instead of 
STABLE.

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch, 
> YARN-8080.014.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475188#comment-16475188
 ] 

Suma Shivaprasad commented on YARN-8080:


Attached patch with fixes to 

1. Request containers according to restart policy and not every time containers 
exit as was happening earlier.
2. checkAndUpdateServiceState was not getting called in case of terminating 
components which was causing Service state to be always STARTED instead of 
STABLE.

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper

2018-05-14 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475163#comment-16475163
 ] 

Takanobu Asanuma commented on YARN-8123:


Thanks for the patch, [~dineshchitlangia]!

I have confirmed that the below command with applying the patch succeeds 
through jdk9 and jdk10. +1 (non-binding).

{noformat}
mvn clean javadoc:javadoc --projects 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common
{noformat}

> Skip compiling old hamlet package when the Java version is 10 or upper
> --
>
> Key: YARN-8123
> URL: https://issues.apache.org/jira/browse/YARN-8123
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
> Environment: Java 10 or upper
>Reporter: Akira Ajisaka
>Assignee: Dinesh Chitlangia
>Priority: Major
>  Labels: newbie
> Attachments: YARN-8123.001.patch
>
>
> HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, 
> however, it is not skipped with Java 10+. We need to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8080) YARN native service should support component restart policy

2018-05-14 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated YARN-8080:
---
Attachment: YARN-8080.013.patch

> YARN native service should support component restart policy
> ---
>
> Key: YARN-8080
> URL: https://issues.apache.org/jira/browse/YARN-8080
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Suma Shivaprasad
>Priority: Critical
> Attachments: YARN-8080.001.patch, YARN-8080.002.patch, 
> YARN-8080.003.patch, YARN-8080.005.patch, YARN-8080.006.patch, 
> YARN-8080.007.patch, YARN-8080.009.patch, YARN-8080.010.patch, 
> YARN-8080.011.patch, YARN-8080.012.patch, YARN-8080.013.patch
>
>
> Existing native service assumes the service is long running and never 
> finishes. Containers will be restarted even if exit code == 0. 
> To support boarder use cases, we need to allow restart policy of component 
> specified by users. Propose to have following policies:
> 1) Always: containers always restarted by framework regardless of container 
> exit status. This is existing/default behavior.
> 2) Never: Do not restart containers in any cases after container finishes: To 
> support job-like workload (for example Tensorflow training job). If a task 
> exit with code == 0, we should not restart the task. This can be used by 
> services which is not restart/recovery-able.
> 3) On-failure: Similar to above, only restart task with exitcode != 0. 
> Behaviors after component *instance* finalize (Succeeded or Failed when 
> restart_policy != ALWAYS): 
> 1) For single component, single instance: complete service.
> 2) For single component, multiple instance: other running instances from the 
> same component won't be affected by the finalized component instance. Service 
> will be terminated once all instances finalized. 
> 3) For multiple components: Service will be terminated once all components 
> finalized.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475113#comment-16475113
 ] 

genericqa commented on YARN-8081:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 11s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 57 unchanged - 0 fixed = 61 total (was 57) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 28m  7s{color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m  
3s{color} | {color:green} hadoop-yarn-services-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
32s{color} | {color:green} hadoop-yarn-services-api in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.cli.TestYarnCLI |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8081 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923353/YARN-8081.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux f5238b61bb96 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/pat

[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475100#comment-16475100
 ] 

Haibo Chen commented on YARN-8250:
--

[~leftnoteasy] Thanks for your comments.

I agree that we should avoid the CS vs FS issue if possible. As I have 
mentioned, the rational is to do things that is suitable for Oversubscription 
but do not break or de-stablize existing functionalities.

Can you elaborate on what do you mean by an issue we need to fix. I was 
describing a behavior that is fine except for over-allocation. Today container 
scheduler tries to launch opportunistic containers whenever there is a 
container scheduling request, or whenever a container finishes. This is not an 
issue today. But, in the case of over-allocation because of the fact that the 
utilization metrics is stale, it is possible that we'd have the following case. 
A few containers finishes, the container monitor checks the node utilization, 
which is low, and then the container scheduler gets the container finish events 
 and aggressively tries to start opportunistic containers. Then later NM 
realizes that opportunistic containers need to be preempted. 

Not sure what can be done here to unify the two, as they fundamentally have 
issues with the other one's approach. Hence, the proposal to have two 
implementations.
{quote}) I'm not sure if we should give all the decisions to CGroups
{quote}
One key thing to note is that we want to ensure GUARANTEED containers are not 
slowed down by OPPORTUNISTIC containers, so cgroup is always a requirement to 
do over-allocation from day one to ensure isolation. Unless  docker container 
executor has similar mechanisms, it is hard to make over-allocation work 
properly with docker without much downsides that render the feature unusable.

 

I am open to suggestions to make things simpler and more maintainable, but as 
noted here, there is fundamental behavior changes. I'll try to take a look at 
if there is more behaviors that we could extract into the base container 
scheduler.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-7402

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Affects Version/s: (was: 3.0.0)
   (was: 2.9.0)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Priority: Minor  (was: Major)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Minor
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Labels:   (was: patch)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Target Version/s:   (was: 2.9.0, 3.0.0)

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Component/s: router

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation, router
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475089#comment-16475089
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8041 at 5/15/18 12:18 AM:
--

Thanks [~yiran] for working on this.
I have already added getAppState in YARN-8186.
Please rebase the patch with the current trunk.
I will review the patch after that.


was (Author: giovanni.fumarola):
Thanks [~yiran] for working on this.
I have already added getAppState in YARN-8186.
Please rebase the patch with the current trunk.

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola reassigned YARN-8041:
--

Assignee: Yiran Wu

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Assignee: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Description: 
This Jira tracks the implementation of some missing REST invocation in 
FederationInterceptorREST:
 * getAppStatistics
 * getNodeToLabels
 * getLabelsOnNode
 * updateApplicationPriority
 * getAppQueue
 * updateAppQueue
 * getAppTimeout
 * getAppTimeouts
 * updateApplicationTimeout
 * getAppAttempts
 * getAppAttempt
 * getContainers
 * getContainer

  was:
This Jira tracks the implementation of some missing REST invocation in 
FederationInterceptorREST:

* getAppStatistics;
*getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 


> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
>  * getAppStatistics
>  * getNodeToLabels
>  * getLabelsOnNode
>  * updateApplicationPriority
>  * getAppQueue
>  * updateAppQueue
>  * getAppTimeout
>  * getAppTimeouts
>  * updateApplicationTimeout
>  * getAppAttempts
>  * getAppAttempt
>  * getContainers
>  * getContainer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Description: 
This Jira tracks the implementation of some missing REST invocation in 
FederationInterceptorREST:

* getAppStatistics;
*getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 

  was:
Implement routing 
getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
 REST invocations transparently to multiple RMs 





> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> This Jira tracks the implementation of some missing REST invocation in 
> FederationInterceptorREST:
> * getAppStatistics;
> *getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8041) [Router] Federation: routing some missing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giovanni Matteo Fumarola updated YARN-8041:
---
Summary: [Router] Federation: routing some missing REST invocations 
transparently to multiple RMs  (was: Federation: Implement multiple 
interfaces(14 interfaces), routing REST invocations transparently to multiple 
RMs )

> [Router] Federation: routing some missing REST invocations transparently to 
> multiple RMs
> 
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8041) Federation: Implement multiple interfaces(14 interfaces), routing REST invocations transparently to multiple RMs

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475089#comment-16475089
 ] 

Giovanni Matteo Fumarola commented on YARN-8041:


Thanks [~yiran] for working on this.
I have already added getAppState in YARN-8186.
Please rebase the patch with the current trunk.

> Federation: Implement multiple interfaces(14 interfaces), routing REST 
> invocations transparently to multiple RMs 
> -
>
> Key: YARN-8041
> URL: https://issues.apache.org/jira/browse/YARN-8041
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Affects Versions: 2.9.0, 3.0.0
>Reporter: Yiran Wu
>Priority: Major
>  Labels: patch
> Attachments: YARN-8041.001.patch, YARN-8041.002.patch, 
> YARN-8041.003.patch
>
>
> Implement routing 
> getAppStatistics/getAppState/getNodeToLabels/getLabelsOnNode/updateApplicationPriority/getAppQueue/updateAppQueue/getAppTimeout/getAppTimeouts/updateApplicationTimeout/getAppAttempts/getAppAttempt/getContainers/getContainer
>  REST invocations transparently to multiple RMs 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475081#comment-16475081
 ] 

Giovanni Matteo Fumarola edited comment on YARN-8180 at 5/15/18 12:11 AM:
--

Thanks [~shenyinjie] for opening the Jira.
The blacklist context in {{FederationClientInterceptor#submitApplication}} is 
just to keep track of the retries and avoid to resubmit an AM where we already 
tried.

[~abmodi] please update the documentation. We added the property to the 
documentation by mistake.

 


was (Author: giovanni.fumarola):
Thanks [~shenyinjie] for opening the Jira.
The blacklist context in \{{FederationClientInterceptor#submitApplication}} is 
just to keep track of the retries and avoid to resubmit an AM where we already 
tried.

[~abmodi] please update the documentation. We may add the property to the 
documentation by mistake.

 

> YARN Federation has not implemented blacklist sub-cluster for AM routing
> 
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
>
> Property "yarn.federation.blacklist-subclusters" is defined in 
> yarn-fedeartion doc,but it has not been defined and implemented in Java code.
> In FederationClientInterceptor#submitApplication()
> {code:java}
> List blacklist = new ArrayList();
> for (int i = 0; i < numSubmitRetries; ++i) {
> SubClusterId subClusterId = policyFacade.getHomeSubcluster(
> request.getApplicationSubmissionContext(), blacklist);
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8180) YARN Federation has not implemented blacklist sub-cluster for AM routing

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475081#comment-16475081
 ] 

Giovanni Matteo Fumarola commented on YARN-8180:


Thanks [~shenyinjie] for opening the Jira.
The blacklist context in \{{FederationClientInterceptor#submitApplication}} is 
just to keep track of the retries and avoid to resubmit an AM where we already 
tried.

[~abmodi] please update the documentation. We may add the property to the 
documentation by mistake.

 

> YARN Federation has not implemented blacklist sub-cluster for AM routing
> 
>
> Key: YARN-8180
> URL: https://issues.apache.org/jira/browse/YARN-8180
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: federation
>Reporter: Shen Yinjie
>Assignee: Abhishek Modi
>Priority: Major
>
> Property "yarn.federation.blacklist-subclusters" is defined in 
> yarn-fedeartion doc,but it has not been defined and implemented in Java code.
> In FederationClientInterceptor#submitApplication()
> {code:java}
> List blacklist = new ArrayList();
> for (int i = 0; i < numSubmitRetries; ++i) {
> SubClusterId subClusterId = policyFacade.getHomeSubcluster(
> request.getApplicationSubmissionContext(), blacklist);
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8290) Yarn application failed to recover with "Error Launching job : User is not set in the application report" error after RM restart

2018-05-14 Thread Yesha Vora (JIRA)
Yesha Vora created YARN-8290:


 Summary: Yarn application failed to recover with "Error Launching 
job : User is not set in the application report" error after RM restart
 Key: YARN-8290
 URL: https://issues.apache.org/jira/browse/YARN-8290
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yesha Vora


Scenario:

1) Start 5 streaming application in background

2) Kill Active RM and cause RM failover

After RM failover, The application failed with below error.

{code}18/02/01 21:24:29 WARN client.RequestHedgingRMFailoverProxyProvider: 
Invocation returned exception on [rm2] : 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1517520038847_0003' doesn't exist in RM. Please check that 
the job submission was successful.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:338)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
, so propagating back to caller.
18/02/01 21:24:29 INFO impl.YarnClientImpl: Submitted application 
application_1517520038847_0003
18/02/01 21:24:30 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
/user/hrt_qa/.staging/job_1517520038847_0003
18/02/01 21:24:30 ERROR streaming.StreamJob: Error Launching job : User is not 
set in the application report
Streaming Command Failed!{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475062#comment-16475062
 ] 

Wangda Tan commented on YARN-8250:
--

[~haibochen], 

I took a very brief look at implemented code since I couldn't find a chance to 
read through the implementation.

My thoughts:
- To me it is important to have a single implementation with different policies 
or just fix it correctly. Otherwise it will enter the CS vs. FS issue short 
after this.
- For 2), it looks like an issue we need to fix: why we want to keep the logic 
to aggressively launch O containers and let them killed by framework shortly 
after launch. 
- For 1) I'm not sure if we should give all the decisions to CGroups. In some 
cases kill container cannot be done immediately by system IIRC (like docker 
container) , it's better to look at existing status of running containers 
before launch a container.

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7900) [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor

2018-05-14 Thread Giovanni Matteo Fumarola (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475061#comment-16475061
 ] 

Giovanni Matteo Fumarola commented on YARN-7900:


Overall the patch looks good to me.
Please add some more javadoc in the {{TestAMRMClientRelayer}} and we can commit 
it.

> [AMRMProxy] AMRMClientRelayer for stateful FederationInterceptor
> 
>
> Key: YARN-7900
> URL: https://issues.apache.org/jira/browse/YARN-7900
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Botong Huang
>Assignee: Botong Huang
>Priority: Major
> Attachments: YARN-7900.v1.patch, YARN-7900.v2.patch, 
> YARN-7900.v3.patch, YARN-7900.v4.patch, YARN-7900.v5.patch, 
> YARN-7900.v6.patch, YARN-7900.v7.patch, YARN-7900.v8.patch
>
>
> Inside stateful FederationInterceptor (YARN-7899), we need a component 
> similar to AMRMClient that remembers all pending (outstands) requests we've 
> sent to YarnRM, auto re-register and do full pending resend when YarnRM fails 
> over and throws ApplicationMasterNotRegisteredException back. This JIRA adds 
> this component as AMRMClientRelayer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-14 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475059#comment-16475059
 ] 

Miklos Szegedi commented on YARN-4599:
--

{quote}bq. Is 'descriptors->event_control_fd = -1;'  necessary?
{quote}
Yes it is a defense against chained errors, it may make it easier to debug when 
you get a core dump.
{quote}bq. 3) The comments for test_oom() does not quite make sense to me. My 
current understanding is that it adds the calling process to the given pgroup 
and simulates an OOM by keep asking OS for memory?
{quote}
You are mixing the parent with the child. The parent gets the child pid and the 
child gets 0 after the fork() since the child can just call getpid(). It forks 
a child process gets it's pid in the parent and adds that to a cgroup. Once the 
child notices that it is in the cgroup it starts eating memory triggering an 
OOM.
{quote}bq. 4) Can you please elaborate on how cgroup simulation is done in 
oom_listener_test_main.c? The child process that is added to the cgroup only 
does sleep().
{quote}
/*
 Unit test for cgroup testing. There are two modes.
 If the unit test is run as root and we have cgroups
 we try to crate a cgroup and generate an OOM.
 If we are not running as root we just sleep instead
 of eating memory and simulate the OOM by sending
 an event in a mock event fd mock_oom_event_as_user.
*/
{quote}bq. 5) Doing a param matching in CGroupsHandlerImpl.GetCGroupParam() 
does not seem a good practice to me.
{quote}
CGroupsHandlerImpl.GetCGroupParam() is a smart function that returns the file 
name given the parameter name. I do not see any good practice issue here. The 
tasks file is always without the controller name.
{quote}bq. 6) Let's wrap the new thread join in ContainersMonitorImpl with 
try-catch clause as we do with the monitoring thread.
{quote}
May I ask why? I thought only exceptions that will actually be thrown need to 
be caught. CGroupElasticMemoryController has a much better cleanup process than 
the monitoring thread and it does not need InterruptedException. In fact any 
interrupted exception would mean that we have likely leaked the external 
process, so I would advise against using it.
{quote}bq. 7) The configuration changes are incompatible  ... How about we 
create separate configurations for pm elastic control and vm elastic control?
{quote}
I do not necessarily agree here.

a) First of all polling and cgroups memory control did not work together before 
the patch either. NM exited with an exception, so there is no previous 
functionality that worked before and it does not work now. There is no 
compatibility break. cgroups takes a precedence indeed, that is a new feature.

b) I would like to have a clean design in the long term for configuration 
avoiding too many configuration entries and definitely avoiding confusion. If 
here is a yarn.nodemanager.pmem-check-enabled, it suggests general use, it 
would be unintuitive not to use it. We indeed cannot change it's general 
meaning anymore. I think the clean design is having 
yarn.nodemanager.resource.memory.enabled to enable cgroups, 
yarn.nodemanager.resource.memory.enforced to enforce it per container and 
yarn.nodemanager.elastic-memory-control.enabled to enforce it at the node 
level. The detailed settings like yarn.nodemanager.pmem-check-enabled and 
yarn.nodemanager.pmem-check-enabled can only intuitively apply to all of them. 
In uderstand the concern but this solution would let us keep only these five 
configuration entries.

11) Does it make sense to have the stopListening logic in `if (!watchdog.get) 
{}` block instead?

It is completely equivalent. It will be called a few milliseconds earlier 
later, but there was a missing explanation there, so I added a comment.
{quote}bq. 16) In TestDefaultOOMHandler.testBothContainersOOM(), I think we 
also need to verify container 2 is killed. Similarly, in  testOneContainerOOM() 
and  testNoContainerOOM().
{quote}
Only one container should be killed. However, I refined the verify logic to be 
even more precise verifying this.

I addressed the rest. I will provide a patch soon.

 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. 

[jira] [Updated] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-8248:
-
Summary: Job hangs when a job requests a resource that its queue does not 
have  (was: Job hangs when a queue is specified and the maxResources of the 
queue cannot satisfy the AM resource request)

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8248) Job hangs when a queue is specified and the maxResources of the queue cannot satisfy the AM resource request

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475058#comment-16475058
 ] 

Haibo Chen commented on YARN-8248:
--

{quote}as {{RMAppManager.validateAndCreateResourceRequest()}} can return a null 
value for the AM requests,
{quote}
Good catch! It does indeed return null if the AM is unmanaged. But I am not 
sure how the debug message helps diagnose this issue. I'd prefer we remove the 
debug message
{quote} Is this explanation makes it cleaner?
{quote}
Yes. That makes sense. Comments would be very help in this case. We could also 
maybe reverse the order of the two conditions. The current diagnostic message 
seems good to me now that I understand what the condition means.
{quote} So in my understanding, it can happen that in {{addApplication()}} the 
app was not rejected, for example AM does not request vCores and we have 0 
vCores configure as max resources, but for a map container, 1 vCores is 
requested.
{quote}
Indeed, that can happen to custom resource types. In FairScheduler.allocate(), 
instead of rejecting an application if any request is rejected, we can just 
filtering out the ones that should be rejected by removing them from the ask 
list (with warning log) and proceed. Rejecting an application after it has 
starting running (FairScheduler.allocate() is called remotely by AM) seems 
counter-intuitive. I think we can signal AM by throwing a 
SchedulerInvalidResoureRequestException, which is propagated to AM. What do you 
think?
{quote}About the uncovered unit test: Good point and I was thinking about that 
if we can reject an application only if the AM request is greater than 0 and we 
have 0 configured as max resource or simply in any case where the requested 
resource is greater than max resource, regardless if it is 0 or not.
{quote}
Never mind comment 4). That's based on my previous misunderstanding. If AM 
request is large than than the non-zero max-resource (steady fair share), we 
should not reject, because the queue may get instantaneous fair share that is 
large enough. That's not related to this patch.

 

Let me know if something does not make sense.

 

 

 

> Job hangs when a queue is specified and the maxResources of the queue cannot 
> satisfy the AM resource request
> 
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475011#comment-16475011
 ] 

Haibo Chen commented on YARN-8250:
--

My understanding of SHED_QUEUED_CONTAINERS is to notify container scheduler to 
get rid of some opportunistic containers being queued. The intent of the new 
SCHEDULER_CONTAINERS is to let container scheduler try to launch opportunistic 
containers that are currently being queued. A follow-up would be also reusing 
SCHEDULER_CONTAINERS to preempt running opportunistic containers.  I am not 
sure how to best align the two.

The main reasons why we'd like to introduce a new container scheduler are

1) Minimize impact on GUARANTEED containers from over-allocating node with 
OPPORTUNISTIC containers. Queuing time of GUARANTEED containers would increase 
with more running OPPORTUNISTIC containers, which is the case with 
over-allocating. The code as in YARN-6675 gets complicated,  Alternatively, we 
could launch GUARANTEED containers immediately and reply on cgroup-mechanism 
for preemption. 

2) avoid aggressive OPPORTUNISTIC container launching. One thing to note is 
that in case of over-allocation, we'd rely on the resource utilization metrics 
to decide how much resources we can to launch OPPORTUNISTIC containers. The 
resource utilization metrics in NM is unfortunately only updated every few 
seconds. This can be problematic in that NM could end up with launching too 
many OPPORTUNISTIC containers before the metric is updated. The current default 
container scheduler launches containers aggressively, which could cause 
containers to be launched and killed shortly after.  The new container 
scheduler only schedule OPPORTUNISTIC containers once whenever the utilization 
metric is updated.

It is to my understanding that removing GUARANTEED container queuing would 
de-stablize cases like yours where nodes are running with a high utilization, 
and scheduling OPPORTUNISTIC containers only every few seconds would delay 
launch time in distributed scheduling. 

Hence, we created a plug-gable container scheduler so that we can choose to do 
things differently without causing issues to existing use cases. The new 
container scheduler should probably be named or documented so that it is only 
used when over-allocation is enabled.

 

 

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475002#comment-16475002
 ] 

genericqa commented on YARN-8123:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
34s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
33m 32s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8123 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923349/YARN-8123.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  xml  |
| uname | Linux a15d2008e37a 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d00a0c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20720/testReport/ |
| Max. process+thread count | 405 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20720/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Skip compiling old hamlet package when the Java version is 10 or upper
> --
>
> Key: YARN-8123
> URL: https://issues.apache.org/jira/browse/YARN-8123
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
> Environment: Java 10 or upper
>Reporter: Akira Aj

[jira] [Commented] (YARN-6919) Add default volume mount list

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474996#comment-16474996
 ] 

genericqa commented on YARN-6919:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
40s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 22s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 1 new + 237 unchanged - 0 fixed = 238 total (was 237) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 45s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
27s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 47s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.conf.TestYarnConfigurationFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-6919 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923344/YARN-6919.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 528af0ced50c 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d00a0c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.

[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy

2018-05-14 Thread Dinesh Chitlangia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474841#comment-16474841
 ] 

Dinesh Chitlangia commented on YARN-7340:
-

[~yufeigu] - Thank you for assigning this to me.

Are we looking for logging only the start time of requested resources or do we 
want to log the start time and end time for requested resources?

I think we should log both the start and end time to avoid any ambiguity.

What are your thoughts on this?

> Missing the time stamp in exception message in Class NoOverCommitPolicy
> ---
>
> Key: YARN-7340
> URL: https://issues.apache.org/jira/browse/YARN-7340
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: reservation system
>Affects Versions: 3.1.0
>Reporter: Yufei Gu
>Assignee: Dinesh Chitlangia
>Priority: Minor
>  Labels: newbie++
>
> It could be easily figured out by reading code.
> {code}
>   throw new ResourceOverCommitException(
>   "Resources at time " + " would be overcommitted by "
>   + "accepting reservation: " + reservation.getReservationId());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-14 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8081:

Attachment: YARN-8081.001.patch

> Yarn Service Upgrade: Add support to upgrade a component
> 
>
> Key: YARN-8081
> URL: https://issues.apache.org/jira/browse/YARN-8081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8081.001.patch
>
>
> Yarn service upgrade should provide an API to upgrade the component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-14 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8081:

Description: Yarn service upgrade should provide an API to upg

> Yarn Service Upgrade: Add support to upgrade a component
> 
>
> Key: YARN-8081
> URL: https://issues.apache.org/jira/browse/YARN-8081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8081.001.patch
>
>
> Yarn service upgrade should provide an API to upg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-14 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8081:

Description: Yarn service upgrade should provide an API to upgrade the 
component.  (was: Yarn service upgrade should provide an API to upg)

> Yarn Service Upgrade: Add support to upgrade a component
> 
>
> Key: YARN-8081
> URL: https://issues.apache.org/jira/browse/YARN-8081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8081.001.patch
>
>
> Yarn service upgrade should provide an API to upgrade the component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8123) Skip compiling old hamlet package when the Java version is 10 or upper

2018-05-14 Thread Dinesh Chitlangia (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474835#comment-16474835
 ] 

Dinesh Chitlangia commented on YARN-8123:
-

[~tasanuma0829] and [~ajisakaa] - Patch has been attached. Kindly help to 
review.

> Skip compiling old hamlet package when the Java version is 10 or upper
> --
>
> Key: YARN-8123
> URL: https://issues.apache.org/jira/browse/YARN-8123
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
> Environment: Java 10 or upper
>Reporter: Akira Ajisaka
>Assignee: Dinesh Chitlangia
>Priority: Major
>  Labels: newbie
> Attachments: YARN-8123.001.patch
>
>
> HADOOP-11423 skipped compiling old hamlet package when the Java version is 9, 
> however, it is not skipped with Java 10+. We need to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8284) get_docker_command refactoring

2018-05-14 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474826#comment-16474826
 ] 

Eric Yang commented on YARN-8284:
-

+1 looks good to me.

> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
> Attachments: YARN-8284.001.patch
>
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functions only added the 
> arguments specific to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474808#comment-16474808
 ] 

Arun Suresh commented on YARN-8250:
---

[~haibochen], I am not entirely convinced we really need to make the 
ContainerScheduler plug-able. Maybe if you you could provide a code snippet of 
how the new ContainerScheduler used for over-allocation needs to be different - 
I can maybe have a better context.

You have created a new {{SCHEDULE_CONTAINERS}} event. Wondering if 
{{SHED_QUEUED_CONTAINERS}} should be re-used here ?


> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6919) Add default volume mount list

2018-05-14 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-6919:
--
Attachment: YARN-6919.001.patch

> Add default volume mount list
> -
>
> Key: YARN-6919
> URL: https://issues.apache.org/jira/browse/YARN-6919
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
>  Labels: Docker
> Attachments: YARN-6919.001.patch
>
>
> Piggybacking on YARN-5534, we should create a default list that bind mounts 
> selected volumes into all docker containers. This list will be empty by 
> default 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474761#comment-16474761
 ] 

Haibo Chen commented on YARN-7933:
--

I am okay with just remove the TODO comment, and have the discussion appid 
authentication in a separate jira.

> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474759#comment-16474759
 ] 

Haibo Chen commented on YARN-7933:
--

If it's not in our design, I am inclined to remove it at this point in time. 
{quote}Timeline Token verification is at filter layer at the time of http 
connection establishment i.e even before it reaches servlets.
{quote}
Does that mean if two collectors are allocated on the same node, then one AM 
can forge data of another?  This is probably an independent issue that applies 
to the other TimelineCollectorWebservices endpoint,  putEntities(), which we 
can address in another jira.

> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6677) Preempt all opportunistic containers when root container cgroup goes over memory limit

2018-05-14 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated YARN-6677:
-
Summary: Preempt all opportunistic containers when root container cgroup 
goes over memory limit  (was: Pause and preempt containers when root container 
cgroup goes over memory limit)

> Preempt all opportunistic containers when root container cgroup goes over 
> memory limit
> --
>
> Key: YARN-6677
> URL: https://issues.apache.org/jira/browse/YARN-6677
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Miklos Szegedi
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8250) Create another implementation of ContainerScheduler to support NM overallocation

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474715#comment-16474715
 ] 

Haibo Chen commented on YARN-8250:
--

[~asuresh] Did you get a change to look at the patch?

> Create another implementation of ContainerScheduler to support NM 
> overallocation
> 
>
> Key: YARN-8250
> URL: https://issues.apache.org/jira/browse/YARN-8250
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
> Attachments: YARN-8250-YARN-1011.00.patch, 
> YARN-8250-YARN-1011.01.patch, YARN-8250-YARN-1011.02.patch
>
>
> YARN-6675 adds NM over-allocation support by modifying the existing 
> ContainerScheduler and providing a utilizationBased resource tracker.
> However, the implementation adds a lot of complexity to ContainerScheduler, 
> and future tweak of over-allocation strategy based on how much containers 
> have been launched is even more complicated.
> As such, this Jira proposes a new ContainerScheduler that always launch 
> guaranteed containers immediately and queues opportunistic containers. It 
> relies on a periodical check to launch opportunistic containers. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8284) get_docker_command refactoring

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474712#comment-16474712
 ] 

genericqa commented on YARN-8284:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
39m  6s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 34s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
30s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 73m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8284 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923323/YARN-8284.001.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 89f4b87ecdb5 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d00a0c |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20718/testReport/ |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20718/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
> Attachments: YARN-8284.001.patch
>
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functi

[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474608#comment-16474608
 ] 

Hudson commented on YARN-8130:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14195 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14195/])
YARN-8130 Race condition when container events are published for KILLED 
(haibochen: rev 2d00a0c71b5dde31e2cf8fcb96d9d541d41fb879)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEvent.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEventType.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/TestNMTimelinePublisher.java


> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8284) get_docker_command refactoring

2018-05-14 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger updated YARN-8284:
--
Attachment: YARN-8284.001.patch

> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
> Attachments: YARN-8284.001.patch
>
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functions only added the 
> arguments specific to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-14 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474559#comment-16474559
 ] 

Vrushali C commented on YARN-8130:
--

thanks [~haibochen] , please go ahead

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8130) Race condition when container events are published for KILLED applications

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474552#comment-16474552
 ] 

Haibo Chen commented on YARN-8130:
--

Checking this in later today if no objection

> Race condition when container events are published for KILLED applications
> --
>
> Key: YARN-8130
> URL: https://issues.apache.org/jira/browse/YARN-8130
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2
>Reporter: Charan Hebri
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-8130.01.patch, YARN-8130.02.patch, 
> YARN-8130.03.patch
>
>
> There seems to be a race condition happening when an application is KILLED 
> and the corresponding container event information is being published. For 
> completed containers, a YARN_CONTAINER_FINISHED event is generated but for 
> some containers in a KILLED application this information is missing. Below is 
> a node manager log snippet,
> {code:java}
> 2018-04-09 08:44:54,474 INFO  shuffle.ExternalShuffleBlockResolver 
> (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application 
> application_1523259757659_0003 removed, cleanupLocalDirs = false
> 2018-04-09 08:44:54,478 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1523259757659_0003 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher 
> (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been 
> removed before the entity could be published for 
> TimelineEntity[type='YARN_CONTAINER', 
> id='container_1523259757659_0003_01_02']
> 2018-04-09 08:44:54,478 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just 
> finished : application_1523259757659_0003
> 2018-04-09 08:44:54,488 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_01. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:54,492 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs 
> for container container_1523259757659_0003_01_02. Current good log dirs 
> are /grid/0/hadoop/yarn/log
> 2018-04-09 08:44:55,470 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(192)) - The collector service for 
> application_1523259757659_0003 was removed
> 2018-04-09 08:44:55,472 INFO  containermanager.ContainerManagerImpl 
> (ContainerManagerImpl.java:handle(1572)) - couldn't find application 
> application_1523259757659_0003 while processing FINISH_APPS event. The 
> ResourceManager allocated resources for this application to the NodeManager 
> but no active containers were found to process{code}
> The container id specified in the log, 
> *container_1523259757659_0003_01_02* is the one that has the finished 
> event missing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups

2018-05-14 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474543#comment-16474543
 ] 

Haibo Chen commented on YARN-4599:
--

Thanks [~miklos.szeg...@cloudera.com] for the patch! The 
TestContainersMonitor.testContainerKillOnMemoryOverflow failure seems related.

I have a few comments/questions:

1) Error handling when writing to {{cgroup.event_control}} fails seems missing 
in oom_listener.c, do we need handle such an case?

2) Is 'descriptors->event_control_fd = -1;'  necessary?

3) The comments for test_oom() does not quite make sense to me. My current 
understanding is that it adds the calling process to the given pgroup and 
simulates an OOM by keep asking OS for memory?

4) Can you please elaborate on how cgroup simulation is done in 
oom_listener_test_main.c? The child process that is added to the cgroup only 
does sleep().

5) Doing a param matching in CGroupsHandlerImpl.GetCGroupParam() does not seem 
a good practice to me. Does it make sense to create a new method for the 
special case?

6) Let's wrap the new thread join in ContainersMonitorImpl with try-catch 
clause as we do with the monitoring thread.

7) The configuration changes are incompatible in that before the patch, 
poll-based pmcheck and vmcheck takes precedence over the cgroup based memory 
control mechanism. It is now reversed after the patch. if cgroup-based memory 
control is enabled, then poll-based pmcheck and vmcheck is disabled 
automatically. IIUC, one of the reasons is that we need to reuse the pmcheck 
and vmcheck flags which are dedicated to control the poll-based memory control. 
 How about we create separate configurations for pm elastic control and vm 
elastic control? We can make sure they are mutual exclusive as indicated in 
CGroupElasticMemoryController. We want to keep the elastic memory control 
mechanism independent of the per-container memory control mechanism, so we can 
get ride of the shortcut in checkLimit()  (warnings are probably more 
appropriate if we want to say the poll-based mechanism is not robust, which is 
an issue unrelated to what we are doing here.)

8) In  CGroupElasticMemoryController, we can create a createOomHandler() method 
that is called by the constructor and overridden by the unit tests to avoid the 
test-only setOomHandler() method

9) bq.   // TODO could we miss an event before the process starts?

This is no longer an issue based on your experiment, per our offline discussion?

10) We only need two threads in the thread pool, one for reading the error 
stream, and the other for watching and logging OOM state, don't we. If so, we 
change   executor = Executors.newFixedThreadPool(5);  =>   executor = 
Executors.newFixedThreadPool(2);

11) I'm not quite sure how the watchdog thread can tell the elastic memory 
controller to stop. I guess once the watchdog thread calls stopListening(), the 
process is destroyed,  `(read = events.read(event)` would return false, we'd 
realize in the memory controller thread that OOM is not resolved in time and 
throw an exception to crash NM?  This process seems pretty obscure to me. Doe 
it make sense to have the stopListening logic in `if (!watchdog.get) {}` block 
instead?

12) Can we replace thrown.expected() statements with @Test(expected) which is 
more declarative? Similarly in TestDefaultOOMHandler.

13) In TestCGroupElasticMemoryController.testNormalExit(), not quite sure what 
the purpose of the sleep task is. Can you please add some comments there?

14) Can we add some comments to DefaultOOMHandler javadoc, especially which 
containers are considered to be killed first.

15) if new YarnRuntimeException("Could not find any containers but CGroups " + 
"reserved for containers ran out of memory. " + "I am giving up") is thrown in 
DefaultOOMHandler, CGroupElasticMemoryController  simply logs the exception. Do 
we want to crash the NM as well in this case?

16) In TestDefaultOOMHandler.testBothContainersOOM(), I think we also need to 
verify container 2 is killed. Similarly, in  testOneContainerOOM() and  
testNoContainerOOM().

 

 

> Set OOM control for memory cgroups
> --
>
> Key: YARN-4599
> URL: https://issues.apache.org/jira/browse/YARN-4599
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Assignee: Miklos Szegedi
>Priority: Major
>  Labels: oct16-medium
> Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, 
> YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, 
> YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, 
> YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch
>
>
> YARN-1856 adds memory cgroups enforcing support. We should also expli

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

2018-05-14 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474510#comment-16474510
 ] 

Eric Yang commented on YARN-8108:
-

Kerberos SPN support by browser definition are:

HTTP/, where  is either white list server name or 
canonical DNS name of the server.  Chrome, IE, and Firefox all shares the 
similar logic.  Firefox and IE don't allow canonical DNS to prevent MITM 
attack.  Safari and Chrome supports canonical DNS with options to disable 
canonical DNS.

>From Server point of view, a single server can host multiple virtual hosts 
>with different web applications.  This is technically possible to configure 
>web server to run with multiple SPN.  It is incorrect to assume that same 
>virtual host can serve two different SPN for two different subset of URLs.  
>All browsers do not support subset of URLs to be served by one SPN, while 
>other subset of URLs to be served by another SPN.
  
In Hadoop 0.2x, Hadoop components are designed to serve a collection of 
servlets (log, static, cluster) per port.  Therefore, AuthenticationFilter can 
cover the entire port by targeting the fixed set of servlet for filtering, that 
matches browser expectation without problem.  AuthenticationFilter was later 
reused in Hadoop 1.x and 2.x as Kerberos SPNEGO filter.

The current problem is only surfaced when multiple web contexts are configured 
to share on the same port with same server hostname, and each web contexts 
tried to initialize its own SPN.  This is not by design and it just happened 
due to code reuse and lack of testing.  For Hadoop 2.x+ to offer embedded 
services securely, the individual AuthenticationFilter can be turned into one 
[security 
handler|http://www.eclipse.org/jetty/documentation/9.3.x/architecture.html#_handlers]
 to match Jetty design specification.  This fall through the crack in open 
source when no one is looking because the first security mechanism for Hadoop 
was to implement a XSS filter (was committed as part of Chukwa) instead of 
security handler.  Unfortunately, Hadoop security mechanisms followed the 
bottom up approach to implement as filter instead of following web application 
design to write security handler as Handlers.  Due to lack of understanding 
that session persistence require authentication and authorization security 
mechanism to be built differently from web filters.

The one line change is to loop through all Context and ensure all contexts are 
registered with the same AuthenticationFilter to apply one filter globally to 
all URLs.  This is the reason that this one line patch can plug this security 
hole in the short term bug fix.  The long term solution is writing security 
handler to match handler design to ensure no API breakage during jetty version 
upgrade and improve session persistence in Hadoop web applications, which is 
beyond the scope of this JIRA.

> RM metrics rest API throws GSSException in kerberized environment
> -
>
> Key: YARN-8108
> URL: https://issues.apache.org/jira/browse/YARN-8108
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Kshitij Badani
>Assignee: Eric Yang
>Priority: Blocker
> Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> 
> 
> 
> Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))
> 
> HTTP ERROR 403
> Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
>  GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))
> 
> 
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8289) Modify distributedshell to support Node Attributes

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474472#comment-16474472
 ] 

Naganarasimha G R commented on YARN-8289:
-

Based on the Scheduler API need to modify Distributed shell to support 
NodeAttributes

> Modify distributedshell to support Node Attributes
> --
>
> Key: YARN-8289
> URL: https://issues.apache.org/jira/browse/YARN-8289
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: YARN-3409
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Modifications required in Distributed shell to support NodeAttributes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8289) Modify distributedshell to support Node Attributes

2018-05-14 Thread Naganarasimha G R (JIRA)
Naganarasimha G R created YARN-8289:
---

 Summary: Modify distributedshell to support Node Attributes
 Key: YARN-8289
 URL: https://issues.apache.org/jira/browse/YARN-8289
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: distributed-shell
Affects Versions: YARN-3409
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R


Modifications required in Distributed shell to support NodeAttributes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474464#comment-16474464
 ] 

Naganarasimha G R commented on YARN-7863:
-

Thanks [~sunilg], Would support you in the review of it !

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8284) get_docker_command refactoring

2018-05-14 Thread Eric Badger (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Badger reassigned YARN-8284:
-

Assignee: Eric Badger

> get_docker_command refactoring
> --
>
> Key: YARN-8284
> URL: https://issues.apache.org/jira/browse/YARN-8284
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 3.2.0, 3.1.1
>Reporter: Jason Lowe
>Assignee: Eric Badger
>Priority: Minor
>
> YARN-8274 occurred because get_docker_command's helper functions each have to 
> remember to put the docker binary as the first argument.  This is error prone 
> and causes code duplication for each of the helper functions.  It would be 
> safer and simpler if get_docker_command initialized the docker binary 
> argument in one place and each of the helper functions only added the 
> arguments specific to their particular docker sub-command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474436#comment-16474436
 ] 

Hudson commented on YARN-8288:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14191 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14191/])
YARN-8288. Fix wrong number of table columns in Resource Model doc. 
(naganarasimha_gr: rev 8a2b5914f3a68148f40f99105acf5dafcc326e89)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceModel.md


> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Fix For: 3.1.1, 3.0.3
>
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8275) Create a JNI interface to interact with Windows

2018-05-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-8275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474424#comment-16474424
 ] 

Íñigo Goiri commented on YARN-8275:
---

[~aw], thanks for the feedback, much appreciated.
It looks like we can put all you proposed together into an umbrella for fixing 
the way Hadoop interacts with Windows.
>From this thread, I see:
* Move away from an external processese (winutils.exe) for native code:
** Replace by native Java APIs (e.g., symlinks)
** Replace by something like JNI or so
* Fix the build system to fully leverage cmake instead of msbuild

I would create an umbrella for this bigger task and make this JIRA just a 
subtask focusing on the YARN side (e.g., task).

> Create a JNI interface to interact with Windows
> ---
>
> Key: YARN-8275
> URL: https://issues.apache.org/jira/browse/YARN-8275
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
>Priority: Major
> Attachments: WinUtils-Functions.pdf, WinUtils.CSV
>
>
> I did a quick investigation of the performance of WinUtils in YARN. In 
> average NM calls 4.76 times per second and 65.51 per container.
>  
> | |Requests|Requests/sec|Requests/min|Requests/container|
> |*Sum [WinUtils]*|*135354*|*4.761*|*286.160*|*65.51*|
> |[WinUtils] Execute -help|4148|0.145|8.769|2.007|
> |[WinUtils] Execute -ls|2842|0.0999|6.008|1.37|
> |[WinUtils] Execute -systeminfo|9153|0.321|19.35|4.43|
> |[WinUtils] Execute -symlink|115096|4.048|243.33|57.37|
> |[WinUtils] Execute -task isAlive|4115|0.144|8.699|2.05|
>  Interval: 7 hours, 53 minutes and 48 seconds
> Each execution of WinUtils does around *140 IO ops*, of which 130 are DDL ops.
> This means *666.58* IO ops/second due to WinUtils.
> We should start considering to remove WinUtils from Hadoop and creating a JNI 
> interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7494) Add muti node lookup support for better placement

2018-05-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474418#comment-16474418
 ] 

Sunil G commented on YARN-7494:
---

Sorry for the delay here [~cheersyang]

I am working on a patch to close ur comments and UT. Thank you.

> Add muti node lookup support for better placement
> -
>
> Key: YARN-7494
> URL: https://issues.apache.org/jira/browse/YARN-7494
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
> Attachments: YARN-7494.001.patch, YARN-7494.002.patch, 
> YARN-7494.003.patch, YARN-7494.004.patch, YARN-7494.005.patch, 
> YARN-7494.006.patch, YARN-7494.007.patch, YARN-7494.v0.patch, 
> YARN-7494.v1.patch, multi-node-designProposal.png
>
>
> Instead of single node, for effectiveness we can consider a multi node lookup 
> based on partition to start with.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8271) [UI2] Improve labeling of certain tables

2018-05-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474410#comment-16474410
 ] 

Hudson commented on YARN-8271:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14190 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14190/])
YARN-8271. [UI2] Improve labeling of certain tables. Contributed by (sunilg: 
rev 89d0b87ad324db09f14e00031d20635083d576ed)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/node-menu-panel.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/capacity-queue-info.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/cluster-overview.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/fair-queue-info.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/fair-queue.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/yarn-tools.hbs
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/controllers/yarn-tools.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/helpers/node-menu.js
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/templates/components/yarn-queue/capacity-queue.hbs


> [UI2] Improve labeling of certain tables
> 
>
> Key: YARN-8271
> URL: https://issues.apache.org/jira/browse/YARN-8271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8271.0001.patch
>
>
> Update labeling for few items to avoid confusion
>  - Cluster Page (/cluster-overview):
>  -- "Finished apps" --> "Finished apps from all users"
>  -- "Running apps" --> "Running apps from all users"
>  - Queues overview page (/yarn-queues/root) && Per queue page 
> (/yarn-queue/root/apps)
>  -- "Running Apps" --> "Running apps from all users in queue "
>  - Nodes Page - side bar for all pages 
>  -- "List of Applications" --> "List of Applications on this node"
>  -- "List of Containers" --> "List of Containers on this node"
>  - Yarn Tools
>  ** Yarn Tools --> YARN Tools
>  - Queue page
>  ** Running Apps: --> Running Apps From All Users



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8266) Clicking on application from cluster view should redirect to application attempt page

2018-05-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474380#comment-16474380
 ] 

Sunil G commented on YARN-8266:
---

Looks straightforward. Committing shortly

> Clicking on application from cluster view should redirect to application 
> attempt page
> -
>
> Key: YARN-8266
> URL: https://issues.apache.org/jira/browse/YARN-8266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8266.001.patch
>
>
> Steps:
> 1) Start one application
>  2) Go to cluster overview page
>  3) Click on applicationId from Cluster Resource Usage By Application
> This action redirects to 
> [http://xxx:8088/ui2/#/yarn-app/application_1525740862939_0005] url. This is 
> invalid url. It does not show any details.
> Instead It should redirect to attempt page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474364#comment-16474364
 ] 

Naganarasimha G R commented on YARN-8288:
-

Thanks [~cheersyang] , i agree with you. Committing this patch.

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8271) [UI2] Improve labeling of certain tables

2018-05-14 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-8271:
--
Summary: [UI2] Improve labeling of certain tables  (was: Change UI2 
labeling of certain tables to avoid confusion)

> [UI2] Improve labeling of certain tables
> 
>
> Key: YARN-8271
> URL: https://issues.apache.org/jira/browse/YARN-8271
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-ui-v2
>Affects Versions: 3.1.0
>Reporter: Yesha Vora
>Assignee: Yesha Vora
>Priority: Major
> Attachments: YARN-8271.0001.patch
>
>
> Update labeling for few items to avoid confusion
>  - Cluster Page (/cluster-overview):
>  -- "Finished apps" --> "Finished apps from all users"
>  -- "Running apps" --> "Running apps from all users"
>  - Queues overview page (/yarn-queues/root) && Per queue page 
> (/yarn-queue/root/apps)
>  -- "Running Apps" --> "Running apps from all users in queue "
>  - Nodes Page - side bar for all pages 
>  -- "List of Applications" --> "List of Applications on this node"
>  -- "List of Containers" --> "List of Containers on this node"
>  - Yarn Tools
>  ** Yarn Tools --> YARN Tools
>  - Queue page
>  ** Running Apps: --> Running Apps From All Users



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474175#comment-16474175
 ] 

genericqa commented on YARN-8288:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 42m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
56m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 37s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 69m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | YARN-8288 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12923278/YARN-8288.001.patch |
| Optional Tests |  asflicense  mvnsite  |
| uname | Linux 557a0f702bc9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f3f544b |
| maven | version: Apache Maven 3.3.9 |
| Max. process+thread count | 334 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20717/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7863) Modify placement constraints to support node attributes

2018-05-14 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474124#comment-16474124
 ] 

Sunil G commented on YARN-7863:
---

Given YARN-7892 is now committed, updating this patch shortly

> Modify placement constraints to support node attributes
> ---
>
> Key: YARN-7863
> URL: https://issues.apache.org/jira/browse/YARN-7863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Sunil G
>Assignee: Sunil G
>Priority: Major
>
> This Jira will track to *Modify existing placement constraints to support 
> node attributes.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474105#comment-16474105
 ] 

Weiwei Yang commented on YARN-8288:
---

Hi [~Naganarasimha]

Those are user-defined properties, there are no default values for them. And 
latter section gives example how to configure them, which is pretty 
explanatory. What do you think?

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474102#comment-16474102
 ] 

Naganarasimha G R commented on YARN-8288:
-

[~cheersyang], should it not be that we are putting the default value under 
"Value" column and push the description to the actual "Description" column ?

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7892) Revisit NodeAttribute class structure

2018-05-14 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474099#comment-16474099
 ] 

Naganarasimha G R commented on YARN-7892:
-

Thanks for the review  [~bibinchundatt] , [~sunilg] & [~leftnoteasy] and the 
commit by [~bibinchundatt].

> Revisit NodeAttribute class structure
> -
>
> Key: YARN-7892
> URL: https://issues.apache.org/jira/browse/YARN-7892
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
> Fix For: YARN-3409
>
> Attachments: YARN-7892-YARN-3409.001.patch, 
> YARN-7892-YARN-3409.002.patch, YARN-7892-YARN-3409.003.WIP.patch, 
> YARN-7892-YARN-3409.003.patch, YARN-7892-YARN-3409.004.patch, 
> YARN-7892-YARN-3409.005.patch, YARN-7892-YARN-3409.006.patch, 
> YARN-7892-YARN-3409.007.patch, YARN-7892-YARN-3409.008.patch, 
> YARN-7892-YARN-3409.009.patch, YARN-7892-YARN-3409.010.patch
>
>
> In the existing structure, we had kept the type and value along with the 
> attribute which would create confusion to the user to understand the APIs as 
> they would not be clear as to what needs to be sent for type and value while 
> fetching the mappings for node(s).
> As well as equals will not make sense when we compare only for prefix and 
> name where as values for them might be different.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474079#comment-16474079
 ] 

Weiwei Yang commented on YARN-8288:
---

Uploaded the patch to fix this, targeting to 3.1.1. Please see screenshots in 
the attachment before and after the patch is applied.

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8288:
--
Attachment: YARN-8288.001.patch

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8288:
--
Attachment: before.jpg

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated YARN-8288:
--
Attachment: after.jpg

> Fix wrong number of table columns in Resource Model doc
> ---
>
> Key: YARN-8288
> URL: https://issues.apache.org/jira/browse/YARN-8288
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>Priority: Major
> Attachments: YARN-8288.001.patch, after.jpg, before.jpg
>
>
> In resource model doc, resource-types.xml and node-resource.xml description 
> table has wrong number of columns defined, see 
> [https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8288) Fix wrong number of table columns in Resource Model doc

2018-05-14 Thread Weiwei Yang (JIRA)
Weiwei Yang created YARN-8288:
-

 Summary: Fix wrong number of table columns in Resource Model doc
 Key: YARN-8288
 URL: https://issues.apache.org/jira/browse/YARN-8288
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Weiwei Yang
Assignee: Weiwei Yang


In resource model doc, resource-types.xml and node-resource.xml description 
table has wrong number of columns defined, see 
[https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceModel.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7933) [atsv2 read acls] Add TimelineWriter#writeDomain

2018-05-14 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474013#comment-16474013
 ] 

Rohith Sharma K S commented on YARN-7933:
-

bq. Isn't  it the case that a TimelineClient must be able to authenticate with 
the TimelineCollector first before it can post data to that TimelineCollector?
Here the intention I added is with ATS1.5 approach i.e if same client or 
different client publishes same domain id then collector need to check for ACLs 
for domain i.e owner. In our design, I was not sure about should we check owner 
for domain id, so I added a TODO. If this is not our design in future, we can 
remove this at any point of time.

bq. Where do we check the delegation token on inside 
PerNodeTimelineCollectorService?
Timeline Token verification is at filter layer at the time of http connection 
establishment i.e even before it reaches servlets. Follow the classes 
NodeTimelineCollectorManager#startWebApp TimelineAuthenticationFilter 



> [atsv2 read acls] Add TimelineWriter#writeDomain 
> -
>
> Key: YARN-7933
> URL: https://issues.apache.org/jira/browse/YARN-7933
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vrushali C
>Assignee: Rohith Sharma K S
>Priority: Major
> Attachments: YARN-7933.01.patch, YARN-7933.02.patch, 
> YARN-7933.03.patch, YARN-7933.04.patch, YARN-7933.05.patch
>
>
>  
> Add an API TimelineWriter#writeDomain for writing the domain info 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart

2018-05-14 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473961#comment-16473961
 ] 

Gergo Repas commented on YARN-8191:
---

[~wilfreds] Thanks for the explanation. I think returning *true* when 
removeEmptyIncompatibleQueues() does not remove the queue is intentional, as 
the javadoc of removeEmptyIncompatibleQueues() says: @return true if we can 
create queueToCreate or it already exists. But this is rather confusing, I'm 
thinking a way to refactor this (possibly to split up 
removeEmptyIncompatibleQueues()).

> Fair scheduler: queue deletion without RM restart
> -
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 3.0.1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded

2018-05-14 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473925#comment-16473925
 ] 

Szilard Nemeth commented on YARN-8273:
--

Hi [~grepas]!

Thanks for your patch.

Here are some things I noticed: 
 # In AppLogAggregatorImpl you added 2 {{LOG.warn(...)}} statements, I think 
they should be LOG.error instead.
 # In 
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl#uploadLogsForContainers}}:
 You declared exc as a {{RuntimeException}}, the current code does not leverage 
this as exc will be assigned only an instance of {{LogAggregationDFSException}} 
so you could directly use type {{LogAggregationDFSException}} instead.

 

> Log aggregation does not warn if HDFS quota in target directory is exceeded
> ---
>
> Key: YARN-8273
> URL: https://issues.apache.org/jira/browse/YARN-8273
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: YARN-8273.000.patch
>
>
> It appears that if an HDFS space quota is set on a target directory for log 
> aggregation and the quota is already exceeded when log aggregation is 
> attempted, zero-byte log files will be written to the HDFS directory, however 
> NodeManager logs do not reflect a failure to write the files successfully 
> (i.e. there are no ERROR or WARN messages to this effect).
> An improvement may be worth investigating to alert users to this scenario, as 
> otherwise logs for a YARN application may be missing both on HDFS and locally 
> (after local log cleanup is done) and the user may not otherwise be informed.
> Steps to reproduce:
> * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB)
> * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full
> * Run a Spark or MR job in the cluster
> * Observe that zero byte files are written to HDFS after job completion
> * Observe that YARN container logs are also not present on the NM hosts (or 
> are deleted after yarn.nodemanager.delete.debug-delay-sec)
> * Observe that no ERROR or WARN messages appear to be logged in the NM role 
> log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8268) Fair scheduler: reservable queue is configured both as parent and leaf queue

2018-05-14 Thread Gergo Repas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473906#comment-16473906
 ] 

Gergo Repas commented on YARN-8268:
---

[~haibochen] - thanks for the review and committing the change!
[~wilfreds] - thanks for the review!

> Fair scheduler: reservable queue is configured both as parent and leaf queue
> 
>
> Key: YARN-8268
> URL: https://issues.apache.org/jira/browse/YARN-8268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 3.1.0
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-8268.000.patch, YARN-8268.001.patch
>
>
> The following allocation file
> {code:java}
> 
> 
> 
>   someuser 
>   
> someuser 
>   
>   
> 
> 
> someuser 
>   
> 
> drf
> 
> {code}
> is being parsed as: {{PARENT=[root, root.dedicated], LEAF=[root.default, 
> root.dedicated]}} (AllocationConfiguration.configuredQueues).
> The root.dedicated should only appear as a PARENT queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org