[jira] [Commented] (YARN-4205) Add a service for monitoring application life time out

2016-09-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466519#comment-15466519
 ] 

Jian He commented on YARN-4205:
---

Thanks Rohith,  looks good overall, few comments:

- can you clarify the definition of lifetime in the API, and also mention the 
unit of the time in the getter API
- RMAppRecoveredTransition: this will cause a lot of loggings for active apps 
on RM recovery, remove it as it is already logged in the normal run path ? or 
move to debug level ?
{code}
LOG.info("Application " + app.applicationId
+ " is registered with 
Application lifetime monitor after recovery. "
+ "The lifetime configured 
is " + applicationLifetime + " seconds");
{code}
- use getLong ?
{code}
int monitorInterval = conf.getInt(
YarnConfiguration.RM_APPLICATION_LIFETIME_MONITOR_INTERVAL_MS,
YarnConfiguration.DEFAULT_RM_APPLICATION_LIFETIME_MONITOR_INTERVAL_MS);
{code}

> Add a service for monitoring application life time out
> --
>
> Key: YARN-4205
> URL: https://issues.apache.org/jira/browse/YARN-4205
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: nijel
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4205.patch, 0002-YARN-4205.patch, 
> YARN-4205_01.patch, YARN-4205_02.patch, YARN-4205_03.patch
>
>
> This JIRA intend to provide a lifetime monitor service. 
> The service will monitor the applications where the life time is configured. 
> If the application is running beyond the lifetime, it will be killed. 
> The lifetime will be considered from the submit time.
> The thread monitoring interval is configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5445) Log aggregation configured to different namenode can fail fast

2016-09-05 Thread Chackaravarthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466498#comment-15466498
 ] 

Chackaravarthy commented on YARN-5445:
--

Request to have a look at the patch and give suggestion. Thanks in advance.

> Log aggregation configured to different namenode can fail fast
> --
>
> Key: YARN-5445
> URL: https://issues.apache.org/jira/browse/YARN-5445
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: Chackaravarthy
> Attachments: YARN-5445-1.patch
>
>
> Log aggregation is enabled and configured to write applogs to different 
> cluster or different namespace (NN federation). In these cases, would like to 
> have some configs on attempts or retries to fail fast in case the other 
> cluster is completely down.
> Currently it takes default {{dfs.client.failover.max.attempts}} as 15 and 
> hence adding a latency of 2 to 2.5 mins in each container launch (per node 
> manager).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5618) Support for Intra queue preemption framework

2016-09-05 Thread Sunil G (JIRA)
Sunil G created YARN-5618:
-

 Summary: Support for Intra queue preemption framework
 Key: YARN-5618
 URL: https://issues.apache.org/jira/browse/YARN-5618
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sunil G
Assignee: Sunil G


Currently inter-queue preemption framework covers the basics (configs and 
scheduling monitor interval etc). This new framework will come as new 
CandidateSelector policy. Priority and user-limit will be a part of this 
framework.

This is a tracking jira for the framework impl alone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-05 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466489#comment-15466489
 ] 

Sunil G commented on YARN-4945:
---

Thanks [~eepayne]

bq.LeafQueue#getApplications returns an umnodifiable Collection
Yes, I have made changes to handle this scenario.

bq.if it's already in selectedCandidates, it's because an inter-queue 
preemption policy put it there
I think I must give some more clarity for what I am trying to do here. Its 
possible that there can be some containers which were selected by 
priority/user-limit policy may already be selected from inter-queue policies. 
In that case, we need not have to mark them again. Rather we can deduct the 
resource directly as its container marked for preemption.

bq.container's resources twice from toObtainByPartition
Its a mistake, I corrected the same in second patch.


> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5608) TestAMRMClient.setup() fails with ArrayOutOfBoundsException

2016-09-05 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15466422#comment-15466422
 ] 

Rohith Sharma K S commented on YARN-5608:
-

+1 LGTM

> TestAMRMClient.setup() fails with ArrayOutOfBoundsException
> ---
>
> Key: YARN-5608
> URL: https://issues.apache.org/jira/browse/YARN-5608
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
> Attachments: YARN-5608.002.patch, YARN-5608.003.patch, 
> YARN-5608.004.patch, YARN-5608.005.patch, YARN-5608.patch
>
>
> After 39 runs the {{TestAMRMClient}} test, I encountered:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>   at java.util.ArrayList.rangeCheck(ArrayList.java:635)
>   at java.util.ArrayList.get(ArrayList.java:411)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TestAMRMClient.setup(TestAMRMClient.java:144)
> {noformat}
> I see it shows up occasionally in the error emails as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4945) [Umbrella] Capacity Scheduler Preemption Within a queue

2016-09-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465592#comment-15465592
 ] 

Eric Payne commented on YARN-4945:
--

[~leftnoteasy] and [~sunilg],
bq. Using logic similar to 
{{deductPreemptableResourcesBasedSelectedCandidates}} should be able to achieve 
this, and I think it doesn't bring too many complexities to the implementation.
I'm sorry, but I'm still not understanding how this can work.

In {{PriorityCandidatesSelector#preemptFromLeastStarvedApp}}:
{code}
  if (CapacitySchedulerPreemptionUtils.isContainerAlreadySelected(c,
  selectedCandidates)) {
Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource());
Resources.subtractFrom(toObtainByPartition, c.getAllocatedResource());
continue;
  }
{code}
This code seems to indicate that if a container is already in 
{{selectedCandidates}}, it will be preempted and then given back to apps in 
this queue. But if it's already in {{selectedCandidates}}, it's because an 
inter-queue preemption policy put it there, so it's not likely to end up back 
in this queue. Please help me understand what I'm missing.

Also, Why is it subtracting the container's resources twice from 
{{toObtainByPartition}}? Should one of those be 
{{totalPreemptedResourceAllowed}}?

> [Umbrella] Capacity Scheduler Preemption Within a queue
> ---
>
> Key: YARN-4945
> URL: https://issues.apache.org/jira/browse/YARN-4945
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
> Attachments: Intra-Queue Preemption Use Cases.pdf, 
> IntraQueuepreemption-CapacityScheduler (Design).pdf, YARN-2009-wip.2.patch, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3854) Add localization support for docker images

2016-09-05 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15465230#comment-15465230
 ] 

Zhankun Tang commented on YARN-3854:


[~vvasudev], thanks for the review.
Yes. I want to go ahead and start the implementation. It would be great if 
[~shaneku...@gmail.com] can support.

How about we create below sub-tasks in this JIRA (or YARN-3611?)
1. Add support for Docker pull command
2. Add Docker type local resource to enable Docker image localization
3. Add support for Docker image clean up

> Add localization support for docker images
> --
>
> Key: YARN-3854
> URL: https://issues.apache.org/jira/browse/YARN-3854
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Reporter: Sidharta Seethana
>Assignee: Zhankun Tang
> Attachments: YARN-3854-branch-2.8.001.patch, 
> YARN-3854_Localization_support_for_Docker_image_v1.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v2.pdf, 
> YARN-3854_Localization_support_for_Docker_image_v3.pdf
>
>
> We need the ability to localize docker images when those images aren't 
> already available locally. There are various approaches that could be used 
> here with different trade-offs/issues : image archives on HDFS + docker load 
> ,  docker pull during the localization phase or (automatic) docker pull 
> during the run/launch phase. 
> We also need the ability to clean-up old/stale, unused images. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2016-09-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464776#comment-15464776
 ] 

Steve Loughran commented on YARN-3692:
--

I concur with the incompat issue. This needs to be something which downgrades

> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-3692.patch, 0002-YARN-3692.patch
>
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.
> *Use Case*
> Usually this helps in workflow management in a multi-tenant environment when 
> the workflow scheduler (or the hadoop admin) wants to kill a job - and let 
> the user know the reason why the job was killed. Killing the job by setting a 
> diagnostic message is a very good solution for that. Ideally, we can set the 
> diagnostic message on all such interface:
> yarn kill -applicationId ... -diagnosticMessage "some message added by 
> admin/workflow"
> REST API { 'state': 'KILLED', 'diagnosticMessage': 'some message added by 
> admin/workflow'}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-666) [Umbrella] Support rolling upgrades in YARN

2016-09-05 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464330#comment-15464330
 ] 

Brahma Reddy Battula edited comment on YARN-666 at 9/5/16 8:33 AM:
---

Sorry for coming late.Did not seen any documentation for this.. I feel, it will 
be good if rolling upgrade/downgrade/rollback process documented  like hdfs..


was (Author: brahmareddy):
Sorry for coming late, I feel, it will be good if this needs to be documented  
like hdfs..?

> [Umbrella] Support rolling upgrades in YARN
> ---
>
> Key: YARN-666
> URL: https://issues.apache.org/jira/browse/YARN-666
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful, rolling upgrade
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
> Fix For: 2.6.0
>
> Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf
>
>
> Jira to track changes required in YARN to allow rolling upgrades, including 
> documentation and possible upgrade routes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2255) YARN Audit logging not added to log4j.properties

2016-09-05 Thread Ying Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464378#comment-15464378
 ] 

Ying Zhang commented on YARN-2255:
--

Hi [~varun_saxena], would you mind that I take this JIRA and continue to work 
on it?

> YARN Audit logging not added to log4j.properties
> 
>
> Key: YARN-2255
> URL: https://issues.apache.org/jira/browse/YARN-2255
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.4.0
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>
> log4j.properties file which is part of the hadoop package, doesnt have YARN 
> Audit logging tied to it. This leads to audit logs getting generated in 
> normal log files. Audit logs should be generated in a separate log file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-666) [Umbrella] Support rolling upgrades in YARN

2016-09-05 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464330#comment-15464330
 ] 

Brahma Reddy Battula commented on YARN-666:
---

Sorry for coming late, I feel, it will be good if this needs to be documented  
like hdfs..?

> [Umbrella] Support rolling upgrades in YARN
> ---
>
> Key: YARN-666
> URL: https://issues.apache.org/jira/browse/YARN-666
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: graceful, rolling upgrade
>Affects Versions: 2.0.4-alpha
>Reporter: Siddharth Seth
> Fix For: 2.6.0
>
> Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf
>
>
> Jira to track changes required in YARN to allow rolling upgrades, including 
> documentation and possible upgrade routes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5576) Allow resource localization while container is running

2016-09-05 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464221#comment-15464221
 ] 

Jian He commented on YARN-5576:
---

The failed tests are passing locally for me.

> Allow resource localization while container is running
> --
>
> Key: YARN-5576
> URL: https://issues.apache.org/jira/browse/YARN-5576
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-5576.1.patch, YARN-5576.2.patch, YARN-5576.3.patch, 
> YARN-5576.4.branch-2.patch, YARN-5576.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org