[jira] [Comment Edited] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560218#comment-16560218
 ] 

Chandni Singh edited comment on YARN-8579 at 7/27/18 8:01 PM:
--

[~gsaha] Thanks for debugging the issue. patch 2 looks good to me. 

Just a nitpick. Since we use slf4j, we can use it instead of string 
concatenation in the log stmt
{code:java}
LOG.info("Containers recovered after AM registered: " + containers);
{code} 
to 
{code:java}
LOG.info("Containers recovered after AM registered: {} ", containers);
{code}


was (Author: csingh):
[~gsaha] Thanks for debugging the issue. patch 2 looks good to me. 

Just a nitpick. Since we use slf4j, we can use it instead of string 
concatenation in the log stmt
{code:java}
LOG.info("Containers recovered after AM registered: ", containers);
{code}
to 
{code:java}
LOG.info("Containers recovered after AM registered: {} ", containers);
{code}

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8579) New AM attempt could not retrieve previous attempt component data

2018-07-27 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560218#comment-16560218
 ] 

Chandni Singh commented on YARN-8579:
-

[~gsaha] Thanks for debugging the issue. patch 2 looks good to me. 

Just a nitpick. Since we use slf4j, we can use it instead of string 
concatenation in the log stmt
{code:java}
LOG.info("Containers recovered after AM registered: ", containers);
{code}
to 
{code:java}
LOG.info("Containers recovered after AM registered: {} ", containers);
{code}

> New AM attempt could not retrieve previous attempt component data
> -
>
> Key: YARN-8579
> URL: https://issues.apache.org/jira/browse/YARN-8579
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Gour Saha
>Priority: Critical
> Fix For: 3.2.0, 3.1.2
>
> Attachments: YARN-8579.001.patch, YARN-8579.002.patch
>
>
> Steps:
> 1) Launch httpd-docker
> 2) Wait for app to be in STABLE state
> 3) Run validation for app (It takes around 3 mins)
> 4) Stop all Zks 
> 5) Wait 60 sec
> 6) Kill AM
> 7) wait for 30 sec
> 8) Start all ZKs
> 9) Wait for application to finish
> 10) Validate expected containers of the app
> Expected behavior:
> New attempt of AM should start and docker containers launched by 1st attempt 
> should be recovered by new attempt.
> Actual behavior:
> New AM attempt starts. It can not recover 1st attempt docker containers. It 
> can not read component details from ZK. 
> Thus, it starts new attempt for all containers.
> {code}
> 2018-07-19 22:42:47,595 [main] INFO  service.ServiceScheduler - Registering 
> appattempt_1531977563978_0015_02, fault-test-zkrm-httpd-docker into 
> registry
> 2018-07-19 22:42:47,611 [main] INFO  service.ServiceScheduler - Received 1 
> containers from previous attempt.
> 2018-07-19 22:42:47,642 [main] INFO  service.ServiceScheduler - Could not 
> read component paths: 
> `/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components':
>  No such file or directory: KeeperErrorCode = NoNode for 
> /registry/users/hrt-qa/services/yarn-service/fault-test-zkrm-httpd-docker/components
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Handling 
> container_e08_1531977563978_0015_01_03 from previous attempt
> 2018-07-19 22:42:47,643 [main] INFO  service.ServiceScheduler - Record not 
> found in registry for container container_e08_1531977563978_0015_01_03 
> from previous attempt, releasing
> 2018-07-19 22:42:47,649 [AMRM Callback Handler Thread] INFO  
> impl.TimelineV2ClientImpl - Updated timeline service address to xxx:33019
> 2018-07-19 22:42:47,651 [main] INFO  service.ServiceScheduler - Triggering 
> initial evaluation of component httpd
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [INIT COMPONENT 
> httpd]: 2 instances.
> 2018-07-19 22:42:47,652 [main] INFO  component.Component - [COMPONENT httpd] 
> Requesting for 2 container(s){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-27 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559325#comment-16559325
 ] 

Chandni Singh commented on YARN-8584:
-

[~snemeth] Thanks! LGTM

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch, YARN-8584.002.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8509) Fix UserLimit calculation for preemption to balance scenario after queue satisfied

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559043#comment-16559043
 ] 

Chandni Singh commented on YARN-8509:
-

Couple of nits and questions:

1. This is a javadoc block which should be above the method. I understand that 
this test is moved out of another test class but this gives a good opportunity 
to fix this.
{code:java}
/**
 * Test case: Submit three applications (app1/app2/app3) to different
 * queues, queue structure:
 *
 * 
 *   Root
 */  |  \  \
 *   a   b   c  d
 *  30  30  30  10
 * 
 *
 */
{code}

2. Why explicitly setting the log level to debug in the code?
{code}
Logger.getRootLogger().setLevel(Level.DEBUG);
{code}

3. Can you explain the comment? 
{code}
   // We should release pending resource be capped at user limit, think about
// a user ask for 1maps. but cluster can run a max of 1000. In this
// case, as soon as each map finish, other one pending will get scheduled
// When not deduct reserved, total-pending = 3G (u1) + 20G (u2) = 23G
//  deduct reserved, total-pending = 0G (u1) + 20G (u2) = 20G
{code}


> Fix UserLimit calculation for preemption to balance scenario after queue 
> satisfied  
> 
>
> Key: YARN-8509
> URL: https://issues.apache.org/jira/browse/YARN-8509
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Zian Chen
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8509.001.patch, YARN-8509.002.patch
>
>
> In LeafQueue#getTotalPendingResourcesConsideringUserLimit, we calculate total 
> pending resource based on user-limit percent and user-limit factor which will 
> cap pending resource for each user to the minimum of user-limit pending and 
> actual pending. This will prevent queue from taking more pending resource to 
> achieve queue balance after all queue satisfied with its ideal allocation.
>   
>  We need to change the logic to let queue pending can go beyond userlimit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8584) Several typos in Log Aggregation related classes

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559023#comment-16559023
 ] 

Chandni Singh commented on YARN-8584:
-

Looks good.

We can also change the log statements to utilize slf4j instead of concatenating 
strings. 
For example
{code:java}
LOG.warn("rollingMonitorInterval should be more than or equal to " + 
MIN_LOG_ROLLING_INTERVAL + " seconds. Using " + MIN_LOG_ROLLING_INTERVAL + " 
seconds instead.");{code}
to 
{code:java}
LOG.warn("rollingMonitorInterval should be more than or equal to {} seconds. 
Using {} seconds instead.", MIN_LOG_ROLLING_INTERVAL, 
MIN_LOG_ROLLING_INTERVAL);{code}
 

> Several typos in Log Aggregation related classes
> 
>
> Key: YARN-8584
> URL: https://issues.apache.org/jira/browse/YARN-8584
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Minor
> Attachments: YARN-8584.001.patch
>
>
> There are typos in comments, log messages, method names, field names, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558838#comment-16558838
 ] 

Chandni Singh commented on YARN-8508:
-

[~shaneku...@gmail.com] [~eyang] could you please review patch 2?

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8508:

Attachment: YARN-8505.002.patch

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch, YARN-8505.002.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558835#comment-16558835
 ] 

Chandni Singh commented on YARN-8545:
-

[~billie.rinaldi] [~eyang] Do you have any comments on patch 1?

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8508:

Attachment: YARN-8505.001.patch

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8505.001.patch
>
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Commented] (YARN-8508) GPU does not get released even though the container is killed

2018-07-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558571#comment-16558571
 ] 

Chandni Singh commented on YARN-8508:
-

This happens with a container that gets cleaned up before its pid file is 
created. To solve it, we need to release the resources at the end of 
\{{LinuxContainerExecutor.reapContainer()}} just like we do in 
\{{LinuxContainerExecutor.launchContainer()}}, 
{\{LinuxContainerExecutor.reLaunchContainer()}}, and 
\{{LinuxContainerExecutor.reacquireContainer}}.

Please see my explanation below:
Refer \{{container_e21_1532545600682_0001_01_02}} in 
yarn8505.nodemanager.log

- 002 is launched but its pid file is not created
{code}
2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid
2018-07-25 19:08:54,409 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid null from path 
/.../application_1532545600682_0001/container_e21_1532545600682_0001_01_02/container_e21_1532545600682_0001_01_02.pid
{code}

- Since application is killed, 002 is killed by ResourceManager
{code}
2018-07-25 19:08:54,643 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e21_1532545600682_0001_01_02 of type CONTAINER_KILLED_ON_REQUEST
{code}

- The above triggers \{{ContainerLaunch.cleanupContainer()}} for 002. This 
happens before the pid file is created
{code}
2018-07-25 19:08:54,409 WARN launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
file created container_e21_1532545600682_0001_01_02
{code}

- \{{cleanupContainer}} invokes \{{reapDockerContainerNoPid(user)}}
{code}
2018-07-25 19:08:54,410 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
but docker container request detected. Attempting to reap container 
container_e21_1532545600682_0001_01_02
{code}

- \{{reapDockerContainerNoPid(user)}} calls \{{exec.reapContainer(...)}}
{code}
2018-07-25 19:08:54,412 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e21_1532545600682_0001_01_02
2018-07-25 19:08:54,412 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: [/.../hadoop-yarn/bin/container-executor, 
--inspect-docker-container, --format=\{{.State.Status}}, 
container_e21_1532545600682_0001_01_02]
2018-07-25 19:08:54,530 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(160)) - Container Status: 
nonexistent ContainerId: container_e21_1532545600682_0001_01_02
2018-07-25 19:08:54,530 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:reapDockerContainerNoPid(948)) - Sent signal to docker 
container container_e21_1532545600682_0001_01_02 as user hrt_qa, 
result=success
{code}

- The problem is that the \{{reapContainer}} in \{{LinuxContainerExecutor}} 
doesn't release the resources assigned to the container. The below code snippet 
that performs these tasks after the container completes doesn't happen at this 
point.
{code}
 resourcesHandler.postExecute(containerId);

try {
 if (resourceHandlerChain != null) {
 LOG.info("{} POST Complete", containerId);
 resourceHandlerChain.postComplete(containerId);
 }
 } catch (ResourceHandlerException e) {
 LOG.warn("ResourceHandlerChain.postComplete failed for " +
 "containerId: " + containerId + ". Exception: " + e);
 }
 }
{code}

- The launch of container fails after 4 minutes and only then the resources are 
released.
{code}
2018-07-25 19:12:09,999 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(593)) - Exit code from container 
container_e21_1532545600682_0001_01_02 is : 27
2018-07-25 19:12:10,000 WARN nodemanager.LinuxContainerExecutor 
(LinuxContainerExecutor.java:handleExitCode(599)) - Exception from 
container-launch with container ID: container_e21_1532545600682_0001_01_02 
and exit code: 27
2018-07-25 19:12:10,000 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Container id: 
container_e21_1532545600682_0001_01_02
2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Docker inspect command: 
/usr/bin/docker inspect --format \{{.State.Pid}} 
container_e21_1532545600682_0001_01_02
2018-07-25 19:12:10,003 INFO nodemanager.ContainerExecutor 
(ContainerExecutor.java:logOutput(541)) - Failed to write pid to file 
/cgroup/cpu/.../container_e21_1532545600682_0001_01_02/tasks - No such 

[jira] [Comment Edited] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553426#comment-16553426
 ] 

Chandni Singh edited comment on YARN-8545 at 7/23/18 9:36 PM:
--

In Patch 1 : 
- releasing containers that failed
- removing failed containers from live instances



was (Author: csingh):
In Patch 1 : 
- releasing containers that failed
- removing containers from live instances


> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8545:

Attachment: YARN-8545.001.patch

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8545.001.patch
>
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8545) YARN native service should return container if launch failed

2018-07-23 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553422#comment-16553422
 ] 

Chandni Singh commented on YARN-8545:
-

[~gsaha] [~billie.rinaldi] could you please review the patch?


> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-20 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.007.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch, YARN-8301.005.patch, 
> YARN-8301.006.patch, YARN-8301.007.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8545) YARN native service should return container if launch failed

2018-07-20 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh reassigned YARN-8545:
---

Assignee: Chandni Singh

> YARN native service should return container if launch failed
> 
>
> Key: YARN-8545
> URL: https://issues.apache.org/jira/browse/YARN-8545
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>
> In some cases, container launch may fail but container will not be properly 
> returned to RM. 
> This could happen when AM trying to prepare container launch context but 
> failed w/o sending container launch context to NM (Once container launch 
> context is sent to NM, NM will report failed container to RM).
> Exception like: 
> {code:java}
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ns1/user/wtan/.yarn/services/tf-job-001/components/1531852429056/primary-worker/primary-worker-0/run-PRIMARY_WORKER.sh
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1583)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1576)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1591)
>   at 
> org.apache.hadoop.yarn.service.utils.CoreFileSystem.createAmResource(CoreFileSystem.java:388)
>   at 
> org.apache.hadoop.yarn.service.provider.ProviderUtils.createConfigFileAndAddLocalResource(ProviderUtils.java:253)
>   at 
> org.apache.hadoop.yarn.service.provider.AbstractProviderService.buildContainerLaunchContext(AbstractProviderService.java:152)
>   at 
> org.apache.hadoop.yarn.service.containerlaunch.ContainerLaunchService$ContainerLauncher.run(ContainerLaunchService.java:105)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745){code}
> And even after container launch context prepare failed, AM still trying to 
> monitor container's readiness:
> {code:java}
> 2018-07-17 18:42:57,518 [pool-7-thread-1] INFO  monitor.ServiceMonitor - 
> Readiness check failed for primary-worker-0: Probe Status, time="Tue Jul 17 
> 18:42:57 UTC 2018", outcome="failure", message="Failure in Default probe: IP 
> presence", exception="java.io.IOException: primary-worker-0: IP is not 
> available yet"
> ...{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8508) GPU does not get released even though the container is killed

2018-07-19 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh reassigned YARN-8508:
---

Assignee: Chandni Singh  (was: Wangda Tan)

> GPU  does not get released even though the container is killed
> --
>
> Key: YARN-8508
> URL: https://issues.apache.org/jira/browse/YARN-8508
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Sumana Sathish
>Assignee: Chandni Singh
>Priority: Major
>
> GPU failed to release even though the container using it is being killed
> {Code}
> 2018-07-06 05:22:26,201 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,250 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from RUNNING to 
> KILLING
> 2018-07-06 05:22:26,251 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(632)) - Application 
> application_1530854311763_0006 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
> 2018-07-06 05:22:26,251 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,358 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:getContainerPid(1102)) - Could not get pid for 
> container_e20_1530854311763_0006_01_02. Waited for 5000 ms.
> 2018-07-06 05:22:31,358 WARN  launcher.ContainerLaunch 
> (ContainerLaunch.java:cleanupContainer(784)) - Container clean up before pid 
> file created container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,359 INFO  launcher.ContainerLaunch 
> (ContainerLaunch.java:reapDockerContainerNoPid(940)) - Unable to obtain pid, 
> but docker container request detected. Attempting to reap container 
> container_e20_1530854311763_0006_01_02
> 2018-07-06 05:22:31,494 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/launch_container.sh
> 2018-07-06 05:22:31,500 INFO  nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:deleteAsUser(828)) - Deleting absolute path : 
> /grid/0/hadoop/yarn/local/usercache/hrt_qa/appcache/application_1530854311763_0006/container_e20_1530854311763_0006_01_02/container_tokens
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,510 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from KILLING to 
> CONTAINER_CLEANEDUP_AFTER_KILL
> 2018-07-06 05:22:31,512 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_01 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:31,513 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0006_01_02 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 2018-07-06 05:22:38,955 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(2093)) - Container 
> container_e20_1530854311763_0007_01_02 transitioned from NEW to SCHEDULED
> {Code}
> New container requesting for GPU fails to launch
> {code}
> 2018-07-06 05:22:39,048 ERROR nodemanager.LinuxContainerExecutor 
> (LinuxContainerExecutor.java:handleLaunchForLaunchType(550)) - 
> ResourceHandlerChain.preStart() failed!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  Failed to find enough GPUs, 
> requestor=container_e20_1530854311763_0007_01_02, #RequestedGPUs=2, 
> #availableGpus=1
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.internalAssignGpus(GpuResourceAllocator.java:225)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceAllocator.assignGpus(GpuResourceAllocator.java:173)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.gpu.GpuResourceHandlerImpl.preStart(GpuResourceHandlerImpl.java:98)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerChain.preStart(ResourceHandlerChain.java:75)
>   at 
> 

[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-19 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.006.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch, YARN-8301.005.patch, 
> YARN-8301.006.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-19 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.005.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch, YARN-8301.005.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
To fix this, will change the format of returned containers to:
{code:java}
[
  {
"name": "ping",
"containers": [
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-0",
"hostname": "ping-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_02",
"ip": "172.26.111.21",
"launch_time": 1531767377301,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-1",
"hostname": "ping-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_07",
"ip": "172.26.111.21",
"launch_time": 1531767410395,
"state": "RUNNING_BUT_UNREADY"
  }
]
  },
  {
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "sleep-1",
"hostname": "sleep-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_05",
"ip": "172.26.111.21",
"launch_time": 1531767378303,
"state": "READY"
  }
]
  }
]{code}
 

  was:
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
Change the list of containers return 


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 
> To fix this, will change the format of returned containers to:
> {code:java}
> [
>   {
> "name": "ping",
> "containers": [
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-0",
> "hostname": "ping-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_02",
> "ip": "172.26.111.21",
> "launch_time": 1531767377301,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-1",
> "hostname": "ping-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_07",
> "ip": "172.26.111.21",
> "launch_time": 1531767410395,
> "state": "RUNNING_BUT_UNREADY"
>   }
> ]
>   },
>   {
>  

[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548405#comment-16548405
 ] 

Chandni Singh commented on YARN-8301:
-

Addressed [~gsaha] comments in patch 4. 

I didn't find many trailing whitespaces. Let me know if you still see them.

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.004.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch, YARN-8301.004.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548384#comment-16548384
 ] 

Chandni Singh commented on YARN-8301:
-

{quote}
 In line 148 do we need the line "name": "sleeper-service" in the JSON spec for 
version 1.0.1 of the service.
{quote}
No, will remove it. 

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 
Change the list of containers return 

  was:
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 
> Change the list of containers return 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548379#comment-16548379
 ] 

Chandni Singh commented on YARN-8542:
-

[~gsaha] Ok. That sounds reasonable. Will change it to the format you have 
proposed.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.003.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548358#comment-16548358
 ] 

Chandni Singh commented on YARN-8301:
-

Addressed offline comments in patch 3

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch, 
> YARN-8301.003.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.002.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch, YARN-8301.002.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8542:

Description: 
GET app/v1/services/{\{service-name}}/component-instances returns a list of 
containers with YARN-8299.
{code:java}
[
{
"id": "container_1531508836237_0001_01_03",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509014497,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-1"
},
{
"id": "container_1531508836237_0001_01_02",
"ip": "192.168.2.51",
"hostname": "HW12119.local",
"state": "READY",
"launch_time": 1531509013492,
"bare_host": "192.168.2.51",
"component_instance_name": "sleeper-0"
}
]{code}
{{component_name}} is not part of container json, so it is hard to tell which 
component an instance belongs to. 

  was:
In YARN-8299, CLI for query container status is implemented to display 
containers in a flat list.  It might be helpful to display component structure 
hierarchy like this:

{code}
[
  {
"name": "ping",
"containers": [
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-0",
"hostname": "ping-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_02",
"ip": "172.26.111.21",
"launch_time": 1531767377301,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "ping-1",
"hostname": "ping-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_07",
"ip": "172.26.111.21",
"launch_time": 1531767410395,
"state": "RUNNING_BUT_UNREADY"
  }
]
  },
  {
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  },
  {
"bare_host": "eyang-4.openstacklocal",
"component_instance_name": "sleep-1",
"hostname": "sleep-1.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_05",
"ip": "172.26.111.21",
"launch_time": 1531767378303,
"state": "READY"
  }
]
  }
]
{code}


> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> GET app/v1/services/{\{service-name}}/component-instances returns a list of 
> containers with YARN-8299.
> {code:java}
> [
> {
> "id": "container_1531508836237_0001_01_03",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509014497,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-1"
> },
> {
> "id": "container_1531508836237_0001_01_02",
> "ip": "192.168.2.51",
> "hostname": "HW12119.local",
> "state": "READY",
> "launch_time": 1531509013492,
> "bare_host": "192.168.2.51",
> "component_instance_name": "sleeper-0"
> }
> ]{code}
> {{component_name}} is not part of container json, so it is hard to tell which 
> component an instance belongs to. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8542) Yarn Service: Add component name to container json

2018-07-18 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548248#comment-16548248
 ] 

Chandni Singh commented on YARN-8542:
-

[~gsaha] 

I am not in favor of the below format:
{code:java}
{
"name": "sleep",
"containers": [
  {
"bare_host": "eyang-5.openstacklocal",
"component_instance_name": "sleep-0",
"hostname": "sleep-0.qqq.hbase.ycluster",
"id": "container_1531765479645_0002_01_04",
"ip": "172.26.111.20",
"launch_time": 1531767377710,
"state": "READY"
  }
}{code}
It doesn't follow the convention. The request is for containers, so it should 
return a list of containers. I prefer adding component_name to the container 
json.

Also it is easy for users to further filter a flat list instead of a nested 
json.

> Yarn Service: Add component name to container json
> --
>
> Key: YARN-8542
> URL: https://issues.apache.org/jira/browse/YARN-8542
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> In YARN-8299, CLI for query container status is implemented to display 
> containers in a flat list.  It might be helpful to display component 
> structure hierarchy like this:
> {code}
> [
>   {
> "name": "ping",
> "containers": [
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-0",
> "hostname": "ping-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_02",
> "ip": "172.26.111.21",
> "launch_time": 1531767377301,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "ping-1",
> "hostname": "ping-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_07",
> "ip": "172.26.111.21",
> "launch_time": 1531767410395,
> "state": "RUNNING_BUT_UNREADY"
>   }
> ]
>   },
>   {
> "name": "sleep",
> "containers": [
>   {
> "bare_host": "eyang-5.openstacklocal",
> "component_instance_name": "sleep-0",
> "hostname": "sleep-0.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_04",
> "ip": "172.26.111.20",
> "launch_time": 1531767377710,
> "state": "READY"
>   },
>   {
> "bare_host": "eyang-4.openstacklocal",
> "component_instance_name": "sleep-1",
> "hostname": "sleep-1.qqq.hbase.ycluster",
> "id": "container_1531765479645_0002_01_05",
> "ip": "172.26.111.21",
> "launch_time": 1531767378303,
> "state": "READY"
>   }
> ]
>   }
> ]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-16 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545691#comment-16545691
 ] 

Chandni Singh commented on YARN-8299:
-

[~eyang] created YARN-8542 for the improvement you suggested. 

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch, YARN-8299.005.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8542) Yarn Service: Add component name to container json

2018-07-16 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8542:
---

 Summary: Yarn Service: Add component name to container json
 Key: YARN-8542
 URL: https://issues.apache.org/jira/browse/YARN-8542
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chandni Singh
Assignee: Chandni Singh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543883#comment-16543883
 ] 

Chandni Singh commented on YARN-8299:
-

Last Jenkins run is against patch 3 which has a broken test. Triggered it to 
run against patch 5.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch, YARN-8299.005.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Attachment: YARN-8299.005.patch

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch, YARN-8299.005.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543831#comment-16543831
 ] 

Chandni Singh commented on YARN-8301:
-

[~gsaha] [~eyang] 

Added a brief document on upgrade. Please review. Thanks.

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-07-13 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8301:

Attachment: YARN-8301.001.patch

> Yarn Service Upgrade: Add documentation
> ---
>
> Key: YARN-8301
> URL: https://issues.apache.org/jira/browse/YARN-8301
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8301.001.patch
>
>
> Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Attachment: YARN-8299.004.patch

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch, YARN-8299.004.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Attachment: YARN-8299.003.patch

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch, 
> YARN-8299.003.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543691#comment-16543691
 ] 

Chandni Singh commented on YARN-8299:
-

Confirming:

There are the 3 filter options:
 # component names: {{-components}}
 # version: {{-version}}
 # component instance states: {{-states}}

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Attachment: YARN-8299.002.patch

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch, YARN-8299.002.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543510#comment-16543510
 ] 

Chandni Singh commented on YARN-8299:
-

{quote}

If user input a state that is not in the defined list, it throws ERROR 500 
error.  It would be nice to report ERROR 400 BAD REQUEST, and display possible 
states.  Since user can only input one state, would it make sense to change 
-states to -state?

{quote}

[~eyang] user can input multiple states. 
{code:java}
yarn container -list test1 -states UPGRADING,NEEDS_UPGRADE | python -m 
json.tool{code}
 

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543442#comment-16543442
 ] 

Chandni Singh commented on YARN-8299:
-

TestAMRMClient passes locally for me. I am able to compile and run the tests of 
hadoop-yarn-client, hadoop-yarn-services-core, and hadoop-yarn-services-api 
modules without any issues.

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-12 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542253#comment-16542253
 ] 

Chandni Singh edited comment on YARN-8299 at 7/12/18 10:17 PM:
---

[~eyang] [~gsaha] could you please review?

Command line to list instances:
{code:java}
yarn container -list test1 -states READY -version 1.0.0 | python -m 
json.tool{code}


was (Author: csingh):
[~eyang] [~gsaha] could you please review?

Command line to list instances:

yarn container -list test1 -states READY

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-12 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542253#comment-16542253
 ] 

Chandni Singh edited comment on YARN-8299 at 7/12/18 10:16 PM:
---

[~eyang] [~gsaha] could you please review?

Command line to list instances:

yarn container -list test1 -states READY


was (Author: csingh):
[~eyang] [~gsaha] could you please review?

Command line to 

yarn container -list test1 -states READY

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-12 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542253#comment-16542253
 ] 

Chandni Singh commented on YARN-8299:
-

[~eyang] [~gsaha] could you please review?

Command line to 

yarn container -list test1 -states READY

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-07-12 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Attachment: YARN-8299.001.patch

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8299.001.patch
>
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-06-29 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Summary: Yarn Service Upgrade: Add GET APIs that returns instances matching 
query params  (was: Yarn Service Upgrade: Add GET APIs that returns 
components/instances matching query params)

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> We need APIs that returns containers/components that match the query params. 
> These are needed so that we can find out what containers/components have been 
> upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns instances matching query params

2018-06-29 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8299:

Description: We need APIs that returns containers that match the query 
params. These are needed so that we can find out what containers have been 
upgraded.  (was: We need APIs that returns containers/components that match the 
query params. These are needed so that we can find out what 
containers/components have been upgraded.)

> Yarn Service Upgrade: Add GET APIs that returns instances matching query 
> params
> ---
>
> Key: YARN-8299
> URL: https://issues.apache.org/jira/browse/YARN-8299
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> We need APIs that returns containers that match the query params. These are 
> needed so that we can find out what containers have been upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-28 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16526578#comment-16526578
 ] 

Chandni Singh commented on YARN-8409:
-

Thanks [~eyang] for reviewing and merging the patch.

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-27 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525688#comment-16525688
 ] 

Chandni Singh commented on YARN-8409:
-

[~eyang] could you please review patch 2?

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-27 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8409:

Attachment: (was: YARN-8409.001.patch)

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-27 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8409:

Attachment: YARN-8409.002.patch

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.001.patch, YARN-8409.002.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8409:

Attachment: YARN-8409.001.patch

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8409.001.patch
>
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524290#comment-16524290
 ] 

Chandni Singh edited comment on YARN-8409 at 6/27/18 12:51 AM:
---

This happens when RM is started immediately after killing zookeeper leader. 
 The {{zkClient}} reference in {{ActiveStandbyElector}} is null which causes 
NPE.

Below is the chain of calls:
 # In {{ActiveStandbyElector}} constructor, at line 274: 
{{reEstablishSession()}} is invoked.
 # {{reEstablishSession}} tries to create zookeeper connection at line 825.
 # {{createConnection}} calls {{connectToZookeeper}} at line 850 to initialize 
{{zkClient}}
 # However, {{connectToZookeeper}} throws IOException because of session timeout
 # {{zkClient}} never gets initialized and is {{null}}.

{{ActiveStandbyElectorBasedElectorService}} currently doesn't care if elector 
is connected to zookeeper and executes {{elector.ensureParentZNode()}} which 
then throws NPE.


was (Author: csingh):
This happens when RM is started immediately after killing zookeeper leader. The 
{{zkClient}} is null. 

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524290#comment-16524290
 ] 

Chandni Singh commented on YARN-8409:
-

This happens when RM is started immediately after killing zookeeper leader. The 
{{zkClient}} is null. 

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-8409) ActiveStandbyElectorBasedElectorService is failing with NPE

2018-06-26 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh reassigned YARN-8409:
---

Assignee: Chandni Singh

> ActiveStandbyElectorBasedElectorService is failing with NPE
> ---
>
> Key: YARN-8409
> URL: https://issues.apache.org/jira/browse/YARN-8409
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Yesha Vora
>Assignee: Chandni Singh
>Priority: Major
>
> In RM-HA env, kill ZK leader and then perform RM failover. 
> Sometimes, active RM gets NPE and fail to come up successfully
> {code:java}
> 2018-06-08 10:31:03,007 INFO  client.ZooKeeperSaslClient 
> (ZooKeeperSaslClient.java:run(289)) - Client will use GSSAPI as SASL 
> mechanism.
> 2018-06-08 10:31:03,008 INFO  zookeeper.ClientCnxn 
> (ClientCnxn.java:logStartConnect(1019)) - Opening socket connection to server 
> xxx/xxx:2181. Will attempt to SASL-authenticate using Login Context section 
> 'Client'
> 2018-06-08 10:31:03,009 WARN  zookeeper.ClientCnxn 
> (ClientCnxn.java:run(1146)) - Session 0x0 for server null, unexpected error, 
> closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
> 2018-06-08 10:31:03,344 INFO  service.AbstractService 
> (AbstractService.java:noteFailure(267)) - Service 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService
>  failed in state INITED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1033)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1095)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1087)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1030)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:336)
> at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1479)
> 2018-06-08 10:31:03,345 INFO  ha.ActiveStandbyElector 
> (ActiveStandbyElector.java:quitElection(409)) - Yielding from election{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on trunk

2018-06-26 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522912#comment-16522912
 ] 

Chandni Singh edited comment on YARN-8458 at 6/26/18 6:52 PM:
--

* {{TestCapacitySchedulerPerf}} on branch-3.1
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] 

#ResourceTypes = 2. Avg of fastest 20: 34602.074
#ResourceTypes = 5. Avg of fastest 20: 25000.0
#ResourceTypes = 4. Avg of fastest 20: 26420.08
#ResourceTypes = 3. Avg of fastest 20: 27173.912
{code}

* {{TestCapacitySchedulerPerf}} on branch-3.0
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 35460.992
#ResourceTypes = 5. Avg of fastest 20: 28129.395
#ResourceTypes = 4. Avg of fastest 20: 29498.525
#ResourceTypes = 3. Avg of fastest 20: 31201.248
{code}

* {{TestCapacitySchedulerPerf}} on bf2b687
{code}
#ResourceTypes = 2. Avg of fastest 20: 30211.48
{code}



was (Author: csingh):
Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 34602.074

#ResourceTypes = 5. Avg of fastest 20: 25000.0

#ResourceTypes = 4. Avg of fastest 20: 26420.08

#ResourceTypes = 3. Avg of fastest 20: 27173.912
{code}
Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 35460.992

#ResourceTypes = 5. Avg of fastest 20: 28129.395

#ResourceTypes = 4. Avg of fastest 20: 29498.525

#ResourceTypes = 3. Avg of fastest 20: 31201.248
{code}

> Perform SLS testing and run TestCapacitySchedulerPerf on trunk
> --
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on trunk

2018-06-25 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522912#comment-16522912
 ] 

Chandni Singh edited comment on YARN-8458 at 6/25/18 11:33 PM:
---

Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 34602.074

#ResourceTypes = 5. Avg of fastest 20: 25000.0

#ResourceTypes = 4. Avg of fastest 20: 26420.08

#ResourceTypes = 3. Avg of fastest 20: 27173.912
{code}
Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 35460.992

#ResourceTypes = 5. Avg of fastest 20: 28129.395

#ResourceTypes = 4. Avg of fastest 20: 29498.525

#ResourceTypes = 3. Avg of fastest 20: 31201.248
{code}


was (Author: csingh):
Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] 
{code}
Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 35460.992

#ResourceTypes = 5. Avg of fastest 20: 28129.395

#ResourceTypes = 4. Avg of fastest 20: 29498.525

#ResourceTypes = 3. Avg of fastest 20: 31201.248
{code}

> Perform SLS testing and run TestCapacitySchedulerPerf on trunk
> --
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on trunk

2018-06-25 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522912#comment-16522912
 ] 

Chandni Singh edited comment on YARN-8458 at 6/25/18 11:19 PM:
---

Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] 
{code}
Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code:java}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf

#ResourceTypes = 2. Avg of fastest 20: 35460.992

#ResourceTypes = 5. Avg of fastest 20: 28129.395

#ResourceTypes = 4. Avg of fastest 20: 29498.525

#ResourceTypes = 3. Avg of fastest 20: 31201.248
{code}


was (Author: csingh):
Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] 
{code}

Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
{code}

> Perform SLS testing and run TestCapacitySchedulerPerf on trunk
> --
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on trunk

2018-06-25 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8458:

Summary: Perform SLS testing and run TestCapacitySchedulerPerf on trunk  
(was: Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1)

> Perform SLS testing and run TestCapacitySchedulerPerf on trunk
> --
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1

2018-06-25 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522912#comment-16522912
 ] 

Chandni Singh commented on YARN-8458:
-

Result of running {{TestCapacitySchedulerPerf}} on branch-3.1
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.312 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] 
{code}

Result of running {{TestCapacitySchedulerPerf}} on branch-3.0
{code}
[INFO] Running 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 277.687 
s - in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf
{code}

> Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1
> ---
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1

2018-06-25 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522911#comment-16522911
 ] 

Chandni Singh commented on YARN-8458:
-

SLS result:

Total has 441027 container allocated, 1470.09 containers allocated per second
Total has 441480 proposal accepted, 1562 rejected

> Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1
> ---
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1

2018-06-25 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8458:

Attachment: sls_snapshot_memory_snapshot_june_25.nps
sls_snapshot_cpu_snapshot_june_25.nps

> Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1
> ---
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: sls_snapshot_cpu_snapshot_june_25.nps, 
> sls_snapshot_memory_snapshot_june_25.nps
>
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1

2018-06-25 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8458:

Description: Run SLS test and TestCapacitySchedulerPerf  (was: Run )

> Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1
> ---
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> Run SLS test and TestCapacitySchedulerPerf



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1

2018-06-25 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8458:

Summary: Perform SLS testing and run TestCapacitySchedulerPerf on 
branch-3.1  (was: Perform SLS testing and run TestCapacitySchedulerPerf)

> Perform SLS testing and run TestCapacitySchedulerPerf on branch-3.1
> ---
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> Run 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf

2018-06-25 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8458:

Description: Run 

> Perform SLS testing and run TestCapacitySchedulerPerf
> -
>
> Key: YARN-8458
> URL: https://issues.apache.org/jira/browse/YARN-8458
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> Run 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8458) Perform SLS testing and run TestCapacitySchedulerPerf

2018-06-25 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8458:
---

 Summary: Perform SLS testing and run TestCapacitySchedulerPerf
 Key: YARN-8458
 URL: https://issues.apache.org/jira/browse/YARN-8458
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Chandni Singh
Assignee: Chandni Singh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8445) YARN native service doesn't allow service name equals to component name

2018-06-20 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518811#comment-16518811
 ] 

Chandni Singh commented on YARN-8445:
-

[~eyang] could you please review?

> YARN native service doesn't allow service name equals to component name
> ---
>
> Key: YARN-8445
> URL: https://issues.apache.org/jira/browse/YARN-8445
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-8445.001.patch
>
>
> Now YARN service doesn't allow specifying service name equals to component 
> name.
> And it causes AM launch fails with msg like:
> {code} 
> org.apache.hadoop.metrics2.MetricsException: Metrics source tf-zeppelin 
> already exists!
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>  at 
> org.apache.hadoop.yarn.service.ServiceMetrics.register(ServiceMetrics.java:75)
>  at 
> org.apache.hadoop.yarn.service.component.Component.(Component.java:193)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.createAllComponents(ServiceScheduler.java:552)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.buildInstance(ServiceScheduler.java:251)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceInit(ServiceScheduler.java:283)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>  at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:142)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:338)
> 2018-06-18 06:50:39,473 [main] INFO service.ServiceScheduler - Stopping 
> service scheduler
> {code}
> It's better to add this check in validation phase instead of failing AM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8445) YARN native service doesn't allow service name equals to component name

2018-06-20 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8445:

Attachment: YARN-8445.001.patch

> YARN native service doesn't allow service name equals to component name
> ---
>
> Key: YARN-8445
> URL: https://issues.apache.org/jira/browse/YARN-8445
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.1.1
>
> Attachments: YARN-8445.001.patch
>
>
> Now YARN service doesn't allow specifying service name equals to component 
> name.
> And it causes AM launch fails with msg like:
> {code} 
> org.apache.hadoop.metrics2.MetricsException: Metrics source tf-zeppelin 
> already exists!
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
>  at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
>  at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
>  at 
> org.apache.hadoop.yarn.service.ServiceMetrics.register(ServiceMetrics.java:75)
>  at 
> org.apache.hadoop.yarn.service.component.Component.(Component.java:193)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.createAllComponents(ServiceScheduler.java:552)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.buildInstance(ServiceScheduler.java:251)
>  at 
> org.apache.hadoop.yarn.service.ServiceScheduler.serviceInit(ServiceScheduler.java:283)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
>  at 
> org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:142)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>  at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:338)
> 2018-06-18 06:50:39,473 [main] INFO service.ServiceScheduler - Stopping 
> service scheduler
> {code}
> It's better to add this check in validation phase instead of failing AM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8445) YARN native service doesn't allow service name equals to component name

2018-06-20 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8445:
---

 Summary: YARN native service doesn't allow service name equals to 
component name
 Key: YARN-8445
 URL: https://issues.apache.org/jira/browse/YARN-8445
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh
 Fix For: 3.1.1


Now YARN service doesn't allow specifying service name equals to component name.

And it causes AM launch fails with msg like:

{code} 
org.apache.hadoop.metrics2.MetricsException: Metrics source tf-zeppelin already 
exists!
 at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
 at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
 at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
 at 
org.apache.hadoop.yarn.service.ServiceMetrics.register(ServiceMetrics.java:75)
 at 
org.apache.hadoop.yarn.service.component.Component.(Component.java:193)
 at 
org.apache.hadoop.yarn.service.ServiceScheduler.createAllComponents(ServiceScheduler.java:552)
 at 
org.apache.hadoop.yarn.service.ServiceScheduler.buildInstance(ServiceScheduler.java:251)
 at 
org.apache.hadoop.yarn.service.ServiceScheduler.serviceInit(ServiceScheduler.java:283)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
 at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
 at 
org.apache.hadoop.yarn.service.ServiceMaster.serviceInit(ServiceMaster.java:142)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
 at org.apache.hadoop.yarn.service.ServiceMaster.main(ServiceMaster.java:338)
2018-06-18 06:50:39,473 [main] INFO service.ServiceScheduler - Stopping service 
scheduler
{code}

It's better to add this check in validation phase instead of failing AM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8402) Yarn Service Destroy: Delete service entries from Zookeeper in the ServiceMaster instead of ServiceClient in the RM

2018-06-06 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8402:

Description: 
RM slows down considerably when multiple services are destroyed simultaneously.

1. Started approx 1000 services
2. Destroyed all the 1000 services.

Observed considerable slowness in RM after this. 
The {{ServiceClient}} in RM uses the {{CuratorClient}} to delete zookeeper 
entries. 
The zookeeper client is the bottleneck and this could be avoided if the 
zookeeper entry can be deleted from the AM and then the {{ServiceClient}} can 
kill the app.

  was:
The overwrite of service definition during flex is done from the ServiceClient. 
During auto finalization of upgrade, the current service definition gets 
overwritten as well by the service master. This creates a potential conflict. 

Need to move the overwrite of service definition during flex to the 
ServiceClient. 
Discussed on YARN-8018.


> Yarn Service Destroy: Delete service entries from Zookeeper in the 
> ServiceMaster instead of ServiceClient in the RM
> ---
>
> Key: YARN-8402
> URL: https://issues.apache.org/jira/browse/YARN-8402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> RM slows down considerably when multiple services are destroyed 
> simultaneously.
> 1. Started approx 1000 services
> 2. Destroyed all the 1000 services.
> Observed considerable slowness in RM after this. 
> The {{ServiceClient}} in RM uses the {{CuratorClient}} to delete zookeeper 
> entries. 
> The zookeeper client is the bottleneck and this could be avoided if the 
> zookeeper entry can be deleted from the AM and then the {{ServiceClient}} can 
> kill the app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8402) Yarn Service Destroy: Delete service entries from Zookeeper in the ServiceMaster instead of ServiceClient in the RM

2018-06-06 Thread Chandni Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8402:

Target Version/s:   (was: 3.1.1)

> Yarn Service Destroy: Delete service entries from Zookeeper in the 
> ServiceMaster instead of ServiceClient in the RM
> ---
>
> Key: YARN-8402
> URL: https://issues.apache.org/jira/browse/YARN-8402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> RM slows down considerably when multiple services are destroyed 
> simultaneously.
> 1. Started approx 1000 services
> 2. Destroyed all the 1000 services.
> Observed considerable slowness in RM after this. 
> The {{ServiceClient}} in RM uses the {{CuratorClient}} to delete zookeeper 
> entries. 
> The zookeeper client is the bottleneck and this could be avoided if the 
> zookeeper entry can be deleted from the AM and then the {{ServiceClient}} can 
> kill the app.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8402) Yarn Service Destroy: Delete service entries from Zookeeper in the ServiceMaster instead of ServiceClient in the RM

2018-06-06 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8402:
---

 Summary: Yarn Service Destroy: Delete service entries from 
Zookeeper in the ServiceMaster instead of ServiceClient in the RM
 Key: YARN-8402
 URL: https://issues.apache.org/jira/browse/YARN-8402
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


The overwrite of service definition during flex is done from the ServiceClient. 
During auto finalization of upgrade, the current service definition gets 
overwritten as well by the service master. This creates a potential conflict. 

Need to move the overwrite of service definition during flex to the 
ServiceClient. 
Discussed on YARN-8018.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8362) Number of remaining retries are updated twice after a container failure in NM

2018-05-25 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491101#comment-16491101
 ] 

Chandni Singh edited comment on YARN-8362 at 5/25/18 7:06 PM:
--

In patch 2, I fixed the checkstyle.
 The test failure 
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testLocalingResourceWhileContainerRunning}}
 is not related to this change.

It fails in the existing trunk even without this change.


was (Author: csingh):
In patch 2, I fixed the checkstyle.
The test failure 
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testLocalingResourceWhileContainerRunning}}
 is not related to this change.  

> Number of remaining retries are updated twice after a container failure in NM 
> --
>
> Key: YARN-8362
> URL: https://issues.apache.org/jira/browse/YARN-8362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8362.001.patch, YARN-8362.002.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also 
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well- 
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following 
> problems:
>  # remainingRetries are updated more than once after a failure. if 
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple 
> calls to {{shouldRetry(int errorCode).}}
>  # Writes to {{retryContext}} should be protected and called when the write 
> lock is held.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8362) Number of remaining retries are updated twice after a container failure in NM

2018-05-25 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8362:

Attachment: YARN-8362.002.patch

> Number of remaining retries are updated twice after a container failure in NM 
> --
>
> Key: YARN-8362
> URL: https://issues.apache.org/jira/browse/YARN-8362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8362.001.patch, YARN-8362.002.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also 
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well- 
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following 
> problems:
>  # remainingRetries are updated more than once after a failure. if 
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple 
> calls to {{shouldRetry(int errorCode).}}
>  # Writes to {{retryContext}} should be protected and called when the write 
> lock is held.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8362) Number of remaining retries are updated twice after a container failure in NM

2018-05-24 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8362:

Attachment: YARN-8362.001.patch

> Number of remaining retries are updated twice after a container failure in NM 
> --
>
> Key: YARN-8362
> URL: https://issues.apache.org/jira/browse/YARN-8362
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8362.001.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also 
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well- 
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following 
> problems:
>  # remainingRetries are updated more than once after a failure. if 
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple 
> calls to {{shouldRetry(int errorCode).}}
>  # Writes to {{retryContext}} should be protected and called when the write 
> lock is held.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8362) Number of remaining retries are updated twice after a container failure in NM

2018-05-24 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8362:
---

 Summary: Number of remaining retries are updated twice after a 
container failure in NM 
 Key: YARN-8362
 URL: https://issues.apache.org/jira/browse/YARN-8362
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh
 Fix For: 3.2.0, 3.1.1


The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also 
updated some fields in retry context- remaining retries, restart times.

This method is directly called from outside the ContainerImpl class as well- 
{{ContainerLaunch.setContainerCompletedStatus}}. This causes following problems:
 # remainingRetries are updated more than once after a failure. if {{maxRetries 
= 1}}, then a retry will not be triggered because of multiple calls to 
{{shouldRetry(int errorCode).}}
 # Writes to {{retryContext}} should be protected and called when the write 
lock is held.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8360) Yarn service conflict between restart policy and NM configuration

2018-05-24 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8360:
---

 Summary: Yarn service conflict between restart policy and NM 
configuration 
 Key: YARN-8360
 URL: https://issues.apache.org/jira/browse/YARN-8360
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Chandni Singh


For the below spec, the service will not stop even after container failures 
because of the NM auto retry properties :
 * "yarn.service.container-failure.retry.max": 1,
 * "yarn.service.container-failure.validity-interval-ms": 5000
 The NM will continue auto-restarting containers.
 {{fail_after 20}} fails after 20 seconds. Since the validity failure interval 
is 5 seconds, NM will auto restart the container.

{code:java}
{
  "name": "fail-demo2",
  "version": "1.0.0",
  "components" :
  [
{
  "name": "comp1",
  "number_of_containers": 1,
  "launch_command": "fail_after 20",
  "restart_policy": "NEVER",
  "resource": {
"cpus": 1,
"memory": "256"
  },
  "configuration": {
"properties": {
  "yarn.service.container-failure.retry.max": 1,
  "yarn.service.container-failure.validity-interval-ms": 5000
}
  }
}
  ]
}
{code}
If {{restart_policy}} is NEVER, then the service should stop after the 
container fails.

Since we have introduced, the service level Restart Policies, I think we should 
make the NM auto retry configurations part of the {{RetryPolicy}} and get rid 
of all {{yarn.service.container-failure.**}} properties. Otherwise it gets 
confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8357) Yarn Service: NPE when service is saved first and then started.

2018-05-24 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8357:

Attachment: YARN-8357.001.patch

> Yarn Service: NPE when service is saved first and then started.
> ---
>
> Key: YARN-8357
> URL: https://issues.apache.org/jira/browse/YARN-8357
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8357.001.patch
>
>
> Line 972 in \{{ServiceClient}} returns a service with state \{{null}} which 
> is why there is a NPE.
> {code:java}
> 2018-05-24 04:39:22,911 INFO  client.ServiceClient 
> (ServiceClient.java:getStatus(1203)) - Service test1 does not have an 
> application ID
> 2018-05-24 04:39:22,911 ERROR webapp.ApiServer 
> (ApiServer.java:updateService(480)) - Error while performing operation for 
> app: test1
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.yarn.service.client.ServiceClient.actionStart(ServiceClient.java:974)
>         at 
> org.apache.hadoop.yarn.service.webapp.ApiServer$7.run(ApiServer.java:650)
>         at 
> org.apache.hadoop.yarn.service.webapp.ApiServer$7.run(ApiServer.java:644)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1687)
>         at 
> org.apache.hadoop.yarn.service.webapp.ApiServer.startService(ApiServer.java:644)
>         at 
> org.apache.hadoop.yarn.service.webapp.ApiServer.updateService(ApiServer.java:449)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>         at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
>         at 
> com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
>         at 
> com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8357) Yarn Service: NPE when service is saved first and then started.

2018-05-24 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8357:
---

 Summary: Yarn Service: NPE when service is saved first and then 
started.
 Key: YARN-8357
 URL: https://issues.apache.org/jira/browse/YARN-8357
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Chandni Singh
Assignee: Chandni Singh


Line 972 in \{{ServiceClient}} returns a service with state \{{null}} which is 
why there is a NPE.
{code:java}
2018-05-24 04:39:22,911 INFO  client.ServiceClient 
(ServiceClient.java:getStatus(1203)) - Service test1 does not have an 
application ID
2018-05-24 04:39:22,911 ERROR webapp.ApiServer 
(ApiServer.java:updateService(480)) - Error while performing operation for app: 
test1

java.lang.NullPointerException

        at 
org.apache.hadoop.yarn.service.client.ServiceClient.actionStart(ServiceClient.java:974)

        at 
org.apache.hadoop.yarn.service.webapp.ApiServer$7.run(ApiServer.java:650)

        at 
org.apache.hadoop.yarn.service.webapp.ApiServer$7.run(ApiServer.java:644)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1687)

        at 
org.apache.hadoop.yarn.service.webapp.ApiServer.startService(ApiServer.java:644)

        at 
org.apache.hadoop.yarn.service.webapp.ApiServer.updateService(ApiServer.java:449)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

        at java.lang.reflect.Method.invoke(Method.java:498)

        at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)

        at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)

        at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)

        at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)

        at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)

        at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)

        at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)

        at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)

        at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)

        at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)

        at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)

        at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)

        at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)

        at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)

        at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-23 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7530:

Attachment: YARN-7530-branch-3.1.001.patch

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7530-branch-3.1.001.patch, YARN-7530.001.patch, 
> YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484564#comment-16484564
 ] 

Chandni Singh edited comment on YARN-8341 at 5/22/18 9:36 PM:
--

The mvn additions are in yarn-services-api pom. In order to run the Integration 
tests
 # cd 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
 # mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root


was (Author: csingh):
The mvn additions are in yarn-services-api pom. In order to run the Integration 
tests
# cd 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
# mvn failsafe:integration-test 
-Drm.host=ctr-e138-1518143905142-80042-01-02.hwx.site -Duser.name=root

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: (was: YARN-8341.wip.patch)

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: YARN-8341.wip.patch

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch, YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484564#comment-16484564
 ] 

Chandni Singh commented on YARN-8341:
-

The mvn additions are in yarn-services-api pom. In order to run the Integration 
tests
# cd 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api
# mvn failsafe:integration-test 
-Drm.host=ctr-e138-1518143905142-80042-01-02.hwx.site -Duser.name=root

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8341:

Attachment: YARN-8341.wip.patch

> Yarn Service: Integration tests 
> 
>
> Key: YARN-8341
> URL: https://issues.apache.org/jira/browse/YARN-8341
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Attachments: YARN-8341.wip.patch
>
>
> In order to test the rest api end-to-end, we can add Integration tests for 
> Yarn service api. 
> The integration tests 
> * belong to junit category {{IntegrationTest}}.
> * will be only run when triggered by executing {{mvn 
> failsafe:integration-test}}
> * the surefire plugin for regular tests excludes {{IntegrationTest}}
> * RM host, user name, and any additional properties which are needed to 
> execute the tests against a cluster can be passed as System properties.
> For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}
> We can add more integration tests which can check scalability and performance.
> Have these tests here benefits everyone in the community because anyone can 
> run these tests against there cluster. 
> Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8341) Yarn Service: Integration tests

2018-05-22 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8341:
---

 Summary: Yarn Service: Integration tests 
 Key: YARN-8341
 URL: https://issues.apache.org/jira/browse/YARN-8341
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Chandni Singh
Assignee: Chandni Singh


In order to test the rest api end-to-end, we can add Integration tests for Yarn 
service api. 
The integration tests 
* belong to junit category {{IntegrationTest}}.
* will be only run when triggered by executing {{mvn failsafe:integration-test}}
* the surefire plugin for regular tests excludes {{IntegrationTest}}
* RM host, user name, and any additional properties which are needed to execute 
the tests against a cluster can be passed as System properties.
For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}}

We can add more integration tests which can check scalability and performance.
Have these tests here benefits everyone in the community because anyone can run 
these tests against there cluster. 

Attaching a work in progress patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-18 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481385#comment-16481385
 ] 

Chandni Singh commented on YARN-7530:
-

Thanks [~eyang]  for reviewing and committing the patch. 
[~gsaha] thanks for reviewing.

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-18 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480977#comment-16480977
 ] 

Chandni Singh commented on YARN-7530:
-

[~gsaha] All the unit tests for Service api and service core pass on my machine.

Jenkins failed to verify the patch but I was able to run the below command 
successfully for the hadoop project.
{code}
mvn clean install -Pdist -Dtar -Dmaven.javadoc.skip=true -DskipShade 
-Danimal.sniffer.skip=true 
{code}
I'll check if something is missed.


> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services

2018-05-17 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-7530:

Attachment: YARN-7530.002.patch

> hadoop-yarn-services-api should be part of hadoop-yarn-services
> ---
>
> Key: YARN-7530
> URL: https://issues.apache.org/jira/browse/YARN-7530
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn-native-services
>Affects Versions: 3.1.0
>Reporter: Eric Yang
>Assignee: Chandni Singh
>Priority: Trivial
> Fix For: yarn-native-services
>
> Attachments: YARN-7530.001.patch, YARN-7530.002.patch
>
>
> Hadoop-yarn-services-api is currently a parallel project to 
> hadoop-yarn-services project.  It would be better if hadoop-yarn-services-api 
> is part of hadoop-yarn-services for correctness.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-17 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479791#comment-16479791
 ] 

Chandni Singh commented on YARN-8141:
-

Thanks [~eyang] for reviewing and merging.

Thanks [~shaneku...@gmail.com], [~billie.rinaldi], and [~leftnoteasy] for the 
reviews.

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch, YARN-8141.005.patch, 
> YARN-8141.006.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-17 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8141:

Attachment: YARN-8141.006.patch

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch, YARN-8141.005.patch, 
> YARN-8141.006.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-17 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8141:

Attachment: YARN-8141.005.patch

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
>  Labels: Docker
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch, YARN-8141.005.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8081) Yarn Service Upgrade: Add support to upgrade a component

2018-05-16 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476929#comment-16476929
 ] 

Chandni Singh commented on YARN-8081:
-

Thanks [~eyang] for reviewing and merging.

> Yarn Service Upgrade: Add support to upgrade a component
> 
>
> Key: YARN-8081
> URL: https://issues.apache.org/jira/browse/YARN-8081
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8081.001.patch, YARN-8081.002.patch, 
> YARN-8081.003.patch
>
>
> Yarn service upgrade should provide an API to upgrade the component.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-15 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476578#comment-16476578
 ] 

Chandni Singh commented on YARN-8141:
-

Failure of  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testContainerUpgradeRollbackDueToFailure
 looks unrelated. The test passes on my machine.


> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8301) Yarn Service Upgrade: Add documentation

2018-05-15 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8301:
---

 Summary: Yarn Service Upgrade: Add documentation
 Key: YARN-8301
 URL: https://issues.apache.org/jira/browse/YARN-8301
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


Add documentation for yarn service upgrade.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8300) Fix NPE in DefaultUpgradeComponentsFinder

2018-05-15 Thread Chandni Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476502#comment-16476502
 ] 

Chandni Singh commented on YARN-8300:
-

Thanks [~suma.shivaprasad] for catching. Looks good to me. 

> Fix NPE in DefaultUpgradeComponentsFinder 
> --
>
> Key: YARN-8300
> URL: https://issues.apache.org/jira/browse/YARN-8300
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Major
> Attachments: YARN-8300.1.patch
>
>
> In current upgrades for Yarn native services, we do not support 
> addition/deletion of compoents during upgrade. On trying to upgrade with the 
> same number of components in target spec as the current service spec but with 
> the one of the components having a new target spec and name, see the 
> following NPE in service AM logs
> {noformat}
> 2018-05-15 00:10:41,489 [IPC Server handler 0 on 37488] ERROR 
> service.ClientAMService - Error while trying to upgrade service {} 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.yarn.service.UpgradeComponentsFinder$DefaultUpgradeComponentsFinder.lambda$findTargetComponentSpecs$0(UpgradeComponentsFinder.java:103)
>   at java.util.ArrayList.forEach(ArrayList.java:1257)
>   at 
> org.apache.hadoop.yarn.service.UpgradeComponentsFinder$DefaultUpgradeComponentsFinder.findTargetComponentSpecs(UpgradeComponentsFinder.java:100)
>   at 
> org.apache.hadoop.yarn.service.ServiceManager.processUpgradeRequest(ServiceManager.java:259)
>   at 
> org.apache.hadoop.yarn.service.ClientAMService.upgrade(ClientAMService.java:163)
>   at 
> org.apache.hadoop.yarn.service.impl.pb.service.ClientAMProtocolPBServiceImpl.upgradeService(ClientAMProtocolPBServiceImpl.java:81)
>   at 
> org.apache.hadoop.yarn.proto.ClientAMProtocol$ClientAMProtocolService$2.callBlockingMethod(ClientAMProtocol.java:5972)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1682)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8299) Yarn Service Upgrade: Add GET APIs that returns components/instances matching query params

2018-05-15 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8299:
---

 Summary: Yarn Service Upgrade: Add GET APIs that returns 
components/instances matching query params
 Key: YARN-8299
 URL: https://issues.apache.org/jira/browse/YARN-8299
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


We need APIs that returns containers/components that match the query params. 
These are needed so that we can find out what containers/components have been 
upgraded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8298) Yarn Service Upgrade: Support fast component upgrades which accepts component spec

2018-05-15 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8298:

Summary: Yarn Service Upgrade: Support fast component upgrades which 
accepts component spec  (was: Yarn Service Upgrade: Support fast component 
upgrades that accept component spec)

> Yarn Service Upgrade: Support fast component upgrades which accepts component 
> spec
> --
>
> Key: YARN-8298
> URL: https://issues.apache.org/jira/browse/YARN-8298
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> Currently service upgrade involves 2 steps
>  * initiate upgrade by providing new spec
>  * trigger upgrade of each instance/component
>  
> We need to add the ability to upgrade a component in shot which accepts the 
> spec of the component. However there are couple of limitations when upgrading 
> in this way:
>  # Aborting the upgrade will not be supported
>  # Upgrade finalization will be done automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-8298) Yarn Service Upgrade: Support fast component upgrades that accept component spec

2018-05-15 Thread Chandni Singh (JIRA)
Chandni Singh created YARN-8298:
---

 Summary: Yarn Service Upgrade: Support fast component upgrades 
that accept component spec
 Key: YARN-8298
 URL: https://issues.apache.org/jira/browse/YARN-8298
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chandni Singh
Assignee: Chandni Singh


Currently service upgrade involves 2 steps
 * initiate upgrade by providing new spec
 * trigger upgrade of each instance/component

 

We need to add the ability to upgrade a component in shot which accepts the 
spec of the component. However there are couple of limitations when upgrading 
in this way:
 # Aborting the upgrade will not be supported
 # Upgrade finalization will be done automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8141) YARN Native Service: Respect YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec

2018-05-15 Thread Chandni Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chandni Singh updated YARN-8141:

Attachment: YARN-8141.004.patch

> YARN Native Service: Respect 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS specified in service spec
> --
>
> Key: YARN-8141
> URL: https://issues.apache.org/jira/browse/YARN-8141
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn-native-services
>Reporter: Wangda Tan
>Assignee: Chandni Singh
>Priority: Critical
> Attachments: YARN-8141.001.patch, YARN-8141.002.patch, 
> YARN-8141.003.patch, YARN-8141.004.patch
>
>
> Existing YARN native service overwrites 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS regardless if user 
> specified this in service spec or not. It is important to allow user to mount 
> local folders like /etc/passwd, etc.
> Following logic overwrites the 
> YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS environment:
> {code:java}
> StringBuilder sb = new StringBuilder();
> for (Entry mount : mountPaths.entrySet()) {
>   if (sb.length() > 0) {
> sb.append(",");
>   }
>   sb.append(mount.getKey());
>   sb.append(":");
>   sb.append(mount.getValue());
> }
> env.put("YARN_CONTAINER_RUNTIME_DOCKER_LOCAL_RESOURCE_MOUNTS", 
> sb.toString());{code}
> Inside AbstractLauncher.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   >