from:"Eric Payne"

Re: [VOTE] Release Apache Hadoop 3.3.2 - RC2

2022-01-21 Thread Eric Payne

+1 (binding)

- Built from source

- Brought up a non-secure virtual cluster w/ NN, 1 DN, RM, AHS, JHS, and 3 NMs

- Validated inter- and intra-queue preemption

- Validated exclusive node labels

Thanks a lot Chao for your diligence and hard work on this release.

Eric















On Wednesday, January 19, 2022, 11:50:34 AM CST, Chao Sun  
wrote: 





Hi all,

I've put together Hadoop 3.3.2 RC2 below:

The RC is available at: http://people.apache.org/~sunchao/hadoop-3.3.2-RC2/
The RC tag is at:
https://github.com/apache/hadoop/releases/tag/release-3.3.2-RC2
The Maven artifacts are staged at:
https://repository.apache.org/content/repositories/orgapachehadoop-1332

You can find my public key at:
https://downloads.apache.org/hadoop/common/KEYS

I've done the following tests and they look good:
- Ran all the unit tests
- Started a single node HDFS cluster and tested a few simple commands
- Ran all the tests in Spark using the RC2 artifacts

Please evaluate the RC and vote, thanks!

Best,
Chao

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10848) Vcore allocation problem with DefaultResourceCalculator

2021-11-11 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-10848.
---
Resolution: Not A Problem

I am closing this JIRA based on the above discussion.

> Vcore allocation problem with DefaultResourceCalculator
> ---
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Peter Bacsko
>Assignee: Minni Mittal
>Priority: Major
>  Labels: pull-request-available
> Attachments: TestTooManyContainers.java
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
>   LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>   + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
>   LOG.warn("Node : " + node.getNodeID()
>   + " does not have sufficient resource for ask : " + pendingAsk
>   + " node total capability : " + node.getTotalResource());
>   // Skip this locality request
>   ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>   activitiesManager, node, application, schedulerKey,
>   ActivityDiagnosticConstant.
>   NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>   + getResourceDiagnostics(capability, totalResource),
>   ActivityLevel.NODE);
>   return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>// Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
>   // Inform the application of the new container for this request
>   RMContainer allocatedContainer =
>   allocate(type, node, schedulerKey, pendingAsk,
>   reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9975) Support proxy ACL user for CapacityScheduler

2021-10-07 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-9975.
--
Resolution: Duplicate

I'm closing this as a dup of YARN-1115. Please reopen if you disagree.

> Support proxy ACL user for CapacityScheduler
> 
>
> Key: YARN-9975
> URL: https://issues.apache.org/jira/browse/YARN-9975
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: zhoukang
>Assignee: zhoukang
>Priority: Major
>
> As commented in YARN-9698.
> I will open a new jira for the proxy user feature. 
> The background is that we have long running  sql thriftserver for many users:
> {quote}{{user->sql proxy-> sql thriftserver}}{quote}
> But we do not have keytab for all users on 'sql proxy'. We just use a super 
> user like 'sql_prc' to submit the 'sql thriftserver' application. To support 
> this we should change the scheduler to support proxy user acl



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10935) AM Total Queue Limit goes below per-uwer AM Limit if parent is full.

2021-09-07 Thread Eric Payne (Jira)

Eric Payne created YARN-10935:
-

 Summary: AM Total Queue Limit goes below per-uwer AM Limit if 
parent is full.
 Key: YARN-10935
 URL: https://issues.apache.org/jira/browse/YARN-10935
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, capacityscheduler
Reporter: Eric Payne


This happens when DRF is enabled and all of one resource is consumed but the 
second resources still has plenty available.

This is reproduceable by setting up a parent queue where the capacity and max 
capacity are the same, with 2 or more sub-queues whose max capacity is 100%.

In one of the sub-queues, start a long-running app that consumes all resources 
in the parent queue's hieararchy. This app will consume all of the memory but 
not vary many vcores (for example)

In a second queue, submit an app. The *{{Max Application Master Resources Per 
User}}* limit is much more than the *{{Max Application Master Resources}}* 
limit.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10834) Intra-queue preemption: apps that don't use defined custom resource won't be preempted.

2021-06-25 Thread Eric Payne (Jira)

Eric Payne created YARN-10834:
-

 Summary: Intra-queue preemption: apps that don't use defined 
custom resource won't be preempted.
 Key: YARN-10834
 URL: https://issues.apache.org/jira/browse/YARN-10834
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Payne
Assignee: Eric Payne


YARN-8292 added handling of negative resources during the preemption 
calculation phase. That JIRA hard-coded it so that for inter-(cross-)queue 
preemption, the a single resource in the vector could go negative while 
calculating ideal assignments and preemptions. It also hard-coded it so that 
during intra-(in-)queue preemption calculations, no resource could not go 
negative. YARN-10613 made these options configurable.

However, in clusters where custom resources are defined, apps that don't use 
the extended resource won't be preempted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10613) Config to allow Intra-queue preemption to enable/disable conservativeDRF

2021-02-03 Thread Eric Payne (Jira)

Eric Payne created YARN-10613:
-

 Summary: Config to allow Intra-queue preemption to  enable/disable 
conservativeDRF
 Key: YARN-10613
 URL: https://issues.apache.org/jira/browse/YARN-10613
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 2.10.1, 3.1.4, 3.2.2, 3.3.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-8292 added code that prevents CS intra-queue preemption from preempting 
containers from an app unless all of the major resources used by the app are 
greater than the user limit for that user.

Ex:
| Used | User Limit |
| <58GB, 58> | <30GB, 300> |

In this example, only used memory is above the user limit, not used vcores. So, 
intra-queue preemption will not occur.

YARN-8292 added the {{conservativeDRF}} flag to 
{{CapacitySchedulerPreemptionUtils#tryPreemptContainerAndDeductResToObtain}}. 
If {{conservativeDRF}} is false, containers will be preempted from apps in the 
example state. If true, containers will not be preempted.

This flag is hard-coded to false for Inter-queue (cross-queue) preemption and 
true for intra-queue (in-queue) preemption.

I propose that in some cases, we want intra-queue preemption to be more 
aggressive and preempt in the example case. To accommodate that, I propose the 
addition of the following config property:
{code:xml}
  

yarn.resourcemanager.monitor.capacity.preemption.intra-queue-preemption.conservative-drf
true
  
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-10164) Allow NM to start even when custom resource type not defined

2021-01-25 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-10164.
---
Resolution: Won't Do

> Allow NM to start even when custom resource type not defined
> 
>
> Key: YARN-10164
> URL: https://issues.apache.org/jira/browse/YARN-10164
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> In the [custom resource 
> documentation|https://hadoop.apache.org/docs/r3.2.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html],
>  it tells you to add the number of custom resources to a property called 
> {{yarn.nodemanager.resource-type.}} in a file called 
> {{node-resources.xml}}.
> For GPU resources, this would look something like
> {code:xml}
>   
> yarn.nodemanager.resource-type.gpu
> 16
>   
> {code}
> A corresponding config property must also be in {{resource-types.xml}} called 
> yarn.resource-types:
> {code:xml}
>   
> yarn.resource-types
> gpu
> Custom resources to be used for scheduling. 
>   
> {code}
> If the yarn.nodemanager.resource-type.gpu property exists without the 
> corresponding yarn.resource-types property, the nodemanager fails to start.
> I would like the option to automatically create the node-resources.xml on all 
> new nodes regardless of whether or not the cluster supports GPU resources so 
> that if I deploy a GPU node into an existing cluster that does not (yet) 
> support GPU resources, the nodemanager will at least start. Even though it 
> doesn't support the GPU resource, the other supported resources will still be 
> available to be used by the apps in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10471) Prevent logs for any container from becoming larger than a configurable size.

2020-10-23 Thread Eric Payne (Jira)

Eric Payne created YARN-10471:
-

 Summary: Prevent logs for any container from becoming larger than 
a configurable size.
 Key: YARN-10471
 URL: https://issues.apache.org/jira/browse/YARN-10471
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.1.4, 3.2.1
Reporter: Eric Payne
Assignee: Eric Payne


Configure a cluster such that a task attempt will be killed if any container 
log exceeds a configured size. This would help prevent logs from filling disks 
and also prevent the need to aggregate enormous logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10456) RM PartitionQueueMetrics records are named QueueMetrics in Simon metrics registry

2020-10-09 Thread Eric Payne (Jira)

Eric Payne created YARN-10456:
-

 Summary: RM PartitionQueueMetrics records are named QueueMetrics 
in Simon metrics registry
 Key: YARN-10456
 URL: https://issues.apache.org/jira/browse/YARN-10456
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.10.1, 3.1.4, 3.2.1, 3.3.0
Reporter: Eric Payne
Assignee: Eric Payne


Several queue metrics (such as AppsRunning, PendingContainers, etc.) stopped 
working after we upgraded to 2.10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10451) RM (v1) UI NodesPage can NPE when yarn.io/gpu resource type is defined.

2020-10-02 Thread Eric Payne (Jira)

Eric Payne created YARN-10451:
-

 Summary: RM (v1) UI NodesPage can NPE when yarn.io/gpu resource 
type is defined.
 Key: YARN-10451
 URL: https://issues.apache.org/jira/browse/YARN-10451
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Payne


The NodesPage in the RM (v1) UI will NPE when the {{yarn.resource-types}} 
property defines {{yarn.io}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-1741) XInclude support broken for YARN ResourceManager

2020-07-21 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-1741.
--
Resolution: Won't Fix

bq. Since branch-2.8 is EOL, I propose that we close this as Won't Fix.
+1

> XInclude support broken for YARN ResourceManager
> 
>
> Key: YARN-1741
> URL: https://issues.apache.org/jira/browse/YARN-1741
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Eric Sirianni
>Assignee: Xuan Gong
>Priority: Critical
>  Labels: regression
>
> The XInclude support in Hadoop configuration files (introduced via 
> HADOOP-4944) was broken by the recent {{ConfigurationProvider}} changes to 
> YARN ResourceManager.  Specifically, YARN-1459 and, more generally, the 
> YARN-1611 family of JIRAs for ResourceManager HA.
> The issue is that {{ConfigurationProvider}} provides a raw {{InputStream}} as 
> a {{Configuration}} resource for what was previously a {{Path}}-based 
> resource.  
> For {{Path}} resources, the absolute file path is used as the {{systemId}} 
> for the {{DocumentBuilder.parse()}} call:
> {code}
>   } else if (resource instanceof Path) {  // a file resource
> ...
>   doc = parse(builder, new BufferedInputStream(
>   new FileInputStream(file)), ((Path)resource).toString());
> }
> {code}
> The {{systemId}} is used to resolve XIncludes (among other things):
> {code}
> /**
>  * Parse the content of the given InputStream as an
>  * XML document and return a new DOM Document object.
> ...
>  * @param systemId Provide a base for resolving relative URIs.
> ...
>  */
> public Document parse(InputStream is, String systemId)
> {code}
> However, for loading raw {{InputStream}} resources, the {{systemId}} is set 
> to {{null}}:
> {code}
>   } else if (resource instanceof InputStream) {
> doc = parse(builder, (InputStream) resource, null);
> {code}
> causing XInclude resolution to fail.
> In our particular environment, we make extensive use of XIncludes to 
> standardize common configuration parameters across multiple Hadoop clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10343) Legacy RM UI should include labeled metrics for allocated, total, and reserved resources.

2020-07-07 Thread Eric Payne (Jira)

Eric Payne created YARN-10343:
-

 Summary: Legacy RM UI should include labeled metrics for 
allocated, total, and reserved resources.
 Key: YARN-10343
 URL: https://issues.apache.org/jira/browse/YARN-10343
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.1.3, 3.2.1, 2.10.0
Reporter: Eric Payne
Assignee: Eric Payne






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9767) PartitionQueueMetrics Issues

2020-06-04 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-9767.
--
Resolution: Duplicate

> PartitionQueueMetrics Issues
> 
>
> Key: YARN-9767
> URL: https://issues.apache.org/jira/browse/YARN-9767
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Attachments: YARN-9767.001.patch
>
>
> The intent of the Jira is to capture the issues/observations encountered as 
> part of YARN-6492 development separately for ease of tracking.
> Observations:
> Please refer this 
> https://issues.apache.org/jira/browse/YARN-6492?focusedCommentId=16904027&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16904027
> 1. Since partition info are being extracted from request and node, there is a 
> problem. For example, 
>  
> Node N has been mapped to Label X (Non exclusive). Queue A has been 
> configured with ANY Node label. App A requested resources from Queue A and 
> its containers ran on Node N for some reasons. During 
> AbstractCSQueue#allocateResource call, Node partition (using SchedulerNode ) 
> would get used for calculation. Lets say allocate call has been fired for 3 
> containers of 1 GB each, then
> a. PartitionDefault * queue A -> pending mb is 3 GB
> b. PartitionX * queue A -> pending mb is -3 GB
>  
> is the outcome. Because app request has been fired without any label 
> specification and #a metrics has been derived. After allocation is over, 
> pending resources usually gets decreased. When this happens, it use node 
> partition info. hence #b metrics has derived. 
>  
> Given this kind of situation, We will need to put some thoughts on achieving 
> the metrics correctly.
>  
> 2. Though the intent of this jira is to do Partition Queue Metrics, we would 
> like to retain the existing Queue Metrics for backward compatibility (as you 
> can see from jira's discussion).
> With this patch and YARN-9596 patch, queuemetrics (for queue's) would be 
> overridden either with some specific partition values or default partition 
> values. It could be vice - versa as well. For example, after the queues (say 
> queue A) has been initialised with some min and max cap and also with node 
> label's min and max cap, Queuemetrics (availableMB) for queue A return values 
> based on node label's cap config.
> I've been working on these observations to provide a fix and attached 
> .005.WIP.patch. Focus of .005.WIP.patch is to ensure availableMB, 
> availableVcores is correct (Please refer above #2 observation). Added more 
> asserts in{{testQueueMetricsWithLabelsOnDefaultLabelNode}} to ensure fix for 
> #2 is working properly.
> Also one more thing to note is, user metrics for availableMB, availableVcores 
> at root queue was not there even before. Retained the same behaviour. User 
> metrics for availableMB, availableVcores is available only at child queue's 
> level and also with partitions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10251) Show extended resources on legacy RM UI.

2020-04-28 Thread Eric Payne (Jira)

Eric Payne created YARN-10251:
-

 Summary: Show extended resources on legacy RM UI.
 Key: YARN-10251
 URL: https://issues.apache.org/jira/browse/YARN-10251
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: Legacy RM UI With Not All Resources Shown.png, Updated 
Legacy RM UI With All Resources Shown.png





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10164) Allow NM to start even when custom resource type not defined

2020-02-25 Thread Eric Payne (Jira)

Eric Payne created YARN-10164:
-

 Summary: Allow NM to start even when custom resource type not 
defined
 Key: YARN-10164
 URL: https://issues.apache.org/jira/browse/YARN-10164
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Eric Payne
Assignee: Eric Payne


In the [custom resource 
documentation|https://hadoop.apache.org/docs/r3.2.1/hadoop-yarn/hadoop-yarn-site/ResourceModel.html],
 it tells you to add the number of custom resources to a property called 
{{yarn.nodemanager.resource-type.}} in a file called 
{{node-resources.xml}}.

For GPU resources, this would look something like
{code:xml}
  
yarn.nodemanager.resource-type.gpu
16
  
{code}

A corresponding config property must also be in {{resource-types.xml}} called 
yarn.resource-types:
{code:xml}
  
yarn.resource-types
gpu
Custom resources to be used for scheduling. 
  
{code}

If the yarn.nodemanager.resource-type.gpu property exists without the 
corresponding yarn.resource-types property, the nodemanager fails to start.

I would like the option to automatically create the node-resources.xml on all 
new nodes regardless of whether or not the cluster supports GPU resources so 
that if I deploy a GPU node into an existing cluster that does not (yet) 
support GPU resources, the nodemanager will at least start. Even though it 
doesn't support the GPU resource, the other supported resources will still be 
available to be used by the apps in the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9790) Failed to set default-application-lifetime if maximum-application-lifetime is less than or equal to zero

2020-01-23 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-9790.
--
Fix Version/s: 2.10.1
   3.1.4
   3.2.2
   Resolution: Fixed

> Failed to set default-application-lifetime if maximum-application-lifetime is 
> less than or equal to zero
> 
>
> Key: YARN-9790
> URL: https://issues.apache.org/jira/browse/YARN-9790
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4, 2.10.1
>
> Attachments: YARN-9790.001.patch, YARN-9790.002.patch, 
> YARN-9790.003.patch, YARN-9790.004.patch
>
>
> capacity-scheduler
> {code}
> ...
> yarn.scheduler.capacity.root.dev.maximum-application-lifetime=-1
> yarn.scheduler.capacity.root.dev.default-application-lifetime=604800
> {code}
> refreshQueue was failed as follows
> {code}
> 2019-08-28 15:21:57,423 WARN  resourcemanager.AdminService 
> (AdminService.java:logAndWrapException(910)) - Exception refresh queues.
> java.io.IOException: Failed to re-init queues : Default lifetime604800 can't 
> exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:477)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:423)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:394)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.service.ResourceManagerAdministrationProtocolPBServiceImpl.refreshQueues(ResourceManagerAdministrationProtocolPBServiceImpl.java:114)
> at 
> org.apache.hadoop.yarn.proto.ResourceManagerAdministrationProtocol$ResourceManagerAdministrationProtocolService$2.callBlockingMethod(ResourceManagerAdministrationProtocol.java:271)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
> Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Default 
> lifetime604800 can't exceed maximum lifetime -1
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:268)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:162)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.(LeafQueue.java:141)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:259)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.parseQueue(CapacitySchedulerQueueManager.java:283)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:171)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:726)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:472)
> ... 12 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10084) Allow inheritance of max app lifetime / default app lifetime

2020-01-13 Thread Eric Payne (Jira)

Eric Payne created YARN-10084:
-

 Summary: Allow inheritance of max app lifetime / default app 
lifetime
 Key: YARN-10084
 URL: https://issues.apache.org/jira/browse/YARN-10084
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 3.1.3, 3.2.1, 2.10.0
Reporter: Eric Payne
Assignee: Eric Payne


Currently, {{maximum-application-lifetime}} and 
{{default-application-lifetime}} must be set for each leaf queue. If it is not 
set for a particular leaf queue, then there will be no time limit on apps 
running in that queue. It should be possible to set 
{{yarn.scheduler.capacity.root.maximum-application-lifetime}} for the root 
queue and allow child queues to override that value if desired.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10033) TestProportionalCapacityPreemptionPolicy not initializing vcores for effective max resources

2019-12-13 Thread Eric Payne (Jira)

Eric Payne created YARN-10033:
-

 Summary: TestProportionalCapacityPreemptionPolicy not initializing 
vcores for effective max resources
 Key: YARN-10033
 URL: https://issues.apache.org/jira/browse/YARN-10033
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, test
Affects Versions: 3.3.0
Reporter: Eric Payne


TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource is 
preempting more containers than would happen on a real cluster.
This is because the process for mocking CS queues in 
{{TestProportionalCapacityPreemptionPolicy}} fails to take into consideration 
vcores when mocking effective max resources.
This causes miscalculations for how many vcores to preempt when the DRF is 
being used in the test:
{code:title=TempQueuePerPartition#offer}
Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
Resources.subtract(getMax(), idealAssigned),
Resource.newInstance(0, 0));
{code}
In the above code, the preemption policy is offering resources to an 
underserved queue. {{getMax()}} will use the effective max resource if it 
exists. Since this test is mocking effective max resources, it will return that 
value. However, since the mock doesn't include vcores, the test treats memory 
as the dominant resource and awards too many preempted containers to the 
underserved queue.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-10009) DRF can treat minimum user limit percent as a max when custom resource is defined

2019-12-02 Thread Eric Payne (Jira)

Eric Payne created YARN-10009:
-

 Summary: DRF can treat minimum user limit percent as a max when 
custom resource is defined
 Key: YARN-10009
 URL: https://issues.apache.org/jira/browse/YARN-10009
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Eric Payne


| | Memory | Vcores | res_1 |
| Queue1 Totals | 20GB | 100 | 80 |
| Resources requested by App1 in Queue1 | 8GB (40% of total) | 8 (8% of total) 
| 80 (100% of total) |

In the previous use case:
- Queue1 has a value of 25 for {{miminum-user-limit-percent}}
- User1 has requested 8 containers with {{}} 
each
- {{res_1}} will be the dominant resource this case.

All 8 containers should be assigned by the capacity scheduler, but with min 
user limit pct set to 25, only 3 containers are assigned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9773) Add QueueMetrics for Custom Resources

2019-10-17 Thread Eric Payne (Jira)



 [ 
https://issues.apache.org/jira/browse/YARN-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-9773.
--
Fix Version/s: 3.1.4
   3.2.2
   3.3.0
   Resolution: Fixed

Thanks [~maniraj...@gmail.com] . I have committed this to trunk, branch-3.2 and 
branch-3.1

> Add QueueMetrics for Custom Resources
> -
>
> Key: YARN-9773
> URL: https://issues.apache.org/jira/browse/YARN-9773
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Manikandan R
>Assignee: Manikandan R
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 3.1.4
>
> Attachments: YARN-9773.001.patch, YARN-9773.002.patch, 
> YARN-9773.003.patch
>
>
> Although the custom resource metrics are calculated and saved as a 
> QueueMetricsForCustomResources object within the QueueMetrics class, the JMX 
> and Simon QueueMetrics do not report that information for custom resources. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-9911) Backport YARN-9773 (Add QueueMetrics for Custom Resources) to branch-2 and branch-2.10

2019-10-17 Thread Eric Payne (Jira)

Eric Payne created YARN-9911:


 Summary: Backport YARN-9773 (Add QueueMetrics for Custom 
Resources) to branch-2 and branch-2.10
 Key: YARN-9911
 URL: https://issues.apache.org/jira/browse/YARN-9911
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, yarn
Affects Versions: 2.10.1, 2.11.0
Reporter: Eric Payne


The feature for tracking queue metrics for custom resources was added in 
YARN-9773. We would like to utilize this same feature in branch-2.

If the same design is to be backported to branch-2, several prerequisites must 
also be backported. Some (but perhaps not all) are listed below. An alternative 
design may be preferable.

{panel:title=Prerequisites for YARN-9773}
YARN-7541
YARN-5707
YARN-7739
YARN-8202
YARN-8750 (backported to branch-2 and branch-2.10)
YARN-8842 (backported to 3.2, 3.1--still needs to go into branch-2)
{panel}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-9894) CapacitySchedulerPerf test for measuring hundreds of apps in a large number of queues.

2019-10-11 Thread Eric Payne (Jira)

Eric Payne created YARN-9894:


 Summary: CapacitySchedulerPerf test for measuring hundreds of apps 
in a large number of queues.
 Key: YARN-9894
 URL: https://issues.apache.org/jira/browse/YARN-9894
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, test
Affects Versions: 3.1.3, 3.2.1, 2.8.5, 2.9.2
Reporter: Eric Payne


I have developed a unit test based on the existing TestCapacitySchedulerPerf 
tests that will measure the performance of a configurable number of apps in a 
configurable number of queues. It will also test the performance of a cluster 
that has many queues but only a portion of them are active.

{code:title=For example:}
$ mvn test 
-Dtest=TestCapacitySchedulerPerf#testUserLimitThroughputWithManyQueues \
  -DRunCapacitySchedulerPerfTests=true
  -DNumberOfQueues=100 \
  -DNumberOfApplications=200 \
  -DPercentActiveQueues=100
{code}

- Parameters:
-- RunCapacitySchedulerPerfTests=true:
Needed in order to trigger the test
-- NumberOfQueues
Configurable number of queues
-- NumberOfApplications
Total number of apps to run in the whole cluster, distributed evenly across all 
queues
-- PercentActiveQueues
Percentage of the queues that contain active applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-9756) Create metric that sums total memory/vcores preempted per round

2019-08-16 Thread Eric Payne (JIRA)

Eric Payne created YARN-9756:


 Summary: Create metric that sums total memory/vcores preempted per 
round
 Key: YARN-9756
 URL: https://issues.apache.org/jira/browse/YARN-9756
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler
Affects Versions: 3.1.2, 2.8.5, 3.0.3, 2.9.2, 3.2.0
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-9685) NPE when rendering the info table of leaf queue in non-accessible partitions

2019-08-08 Thread Eric Payne (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-9685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-9685.
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.1.3
   3.2.1
   3.3.0

Thanks again, [~Tao Yang]. I have committed to trunk, branch-3.2, and 
branch-3.1. Prior releases did not have the issue.

> NPE when rendering the info table of leaf queue in non-accessible partitions
> 
>
> Key: YARN-9685
> URL: https://issues.apache.org/jira/browse/YARN-9685
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.3.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: YARN-9685.001.patch
>
>
> I found incomplete queue info shown on scheduler page and NPE in RM log when 
> rendering the info table of leaf queue in non-accessible partitions.
> {noformat}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:163)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithPartition(CapacitySchedulerPage.java:108)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.render(CapacitySchedulerPage.java:97)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
> at 
> org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
> at org.apache.hadoop.yarn.webapp.View.render(View.java:243)
> {noformat}
> The direct cause is that PartitionQueueCapacitiesInfo of leaf queues in 
> non-accessible partitions are incomplete(part of fields are null such as 
> configuredMinResource/configuredMaxResource/effectiveMinResource/effectiveMaxResource)
>  but some places in CapacitySchedulerPage don't consider that.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[ANNOUNCE] Eric Badger is now a committer!

2019-03-05 Thread Eric Payne

It is my pleasure to announce that Eric Badger has accepted an invitation to 
become a Hadoop Core committer.

Congratulations, Eric! This is well-deserved!

-Eric Payne

Re: [VOTE] Release Apache Hadoop 3.2.0 - RC1

2019-01-15 Thread Eric Payne

+1 (binding)

- RM refresh updates values as expected
- Streaming jobs complete successfully
- Moving apps between queues succeeds
- Inter-queue preemption works as expected
- Successfully ran selected yarn unit tests.

===
Eric Payne
===

On Tuesday, January 8, 2019, 5:42:46 AM CST, Sunil G wrote:

Hi folks,

Thanks to all of you who helped in this release [1] and for helping to vote
for RC0. I have created second release candidate (RC1) for Apache Hadoop
3.2.0.

Artifacts for this RC are available here:

http://home.apache.org/~sunilg/hadoop-3.2.0-RC1/

RC tag in git is release-3.2.0-RC1.

The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1178/

This vote will run 7 days (5 weekdays), ending on 14th Jan at 11:59 pm PST.

3.2.0 contains 1092 [2] fixed JIRA issues since 3.1.0. Below feature
additions

are the highlights of this release.

1. Node Attributes Support in YARN

2. Hadoop Submarine project for running Deep Learning workloads on YARN

3. Support service upgrade via YARN Service API and CLI

4. HDFS Storage Policy Satisfier

5. Support Windows Azure Storage - Blob file system in Hadoop

6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a

7. Improvements in Router-based HDFS federation

Thanks to Wangda, Vinod, Marton for helping me in preparing the release.

I have done few testing with my pseudo cluster. My +1 to start.

Regards,

Sunil

[1]

https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E

[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
ORDER BY fixVersion ASC

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.2.0 - RC0

2018-11-29 Thread Eric Payne

The problem is not with preemption. The yarn-site.xml that I use for my
pseudo-cluster includes a second xml:
xi:include href=".../yarn-scheduler.xml"

The property for yarn.resourcemanager.scheduler.monitor.enable = true is in
this yarn-scheduler.xml.

This value IS READ when then RM starts.

However, when the refreshQueues command is run, this value IS NOT READ.

So, it looks like xml include files are not read on refresh. This will affect
any property. I just happened to notice it on the preemption properties.

I would like input from all of you to determine if this is a blocker for
release. I'm on the fence.

Thanks,
-Eric

On Wednesday, November 28, 2018, 4:58:50 PM CST, Eric Payne
wrote:

Sunil,

So, the basic symptoms are that if preemption is enabled on any queue, the
preemption is disabled after a 'yarn rm -refreshQueues'. In addition, all of
the preemption-specific properties are set back to the default values.

This was introduced in branch-3.1, so it is NOT new behavior for release 3.2.0.
I am still tracking down the cause. I will open a JIRA once I do further
investigation if there is not one already.

This will be a problem for installations which use preemption and which use the
refreshQueues feature.

Thanks,
-Eric

On Wednesday, November 28, 2018, 11:47:06 AM CST, Eric Payne
wrote:

Sunil, thanks for all of the hard work on this release.

I have discovered that queue refresh doesn't work in some cases. For example,
when I change yarn.scheduler.capacity.root.default.disable_preemption, it
doesn't take effect unless I restart the RM.

I am still investigating, but I thought I should bring this up asap.

Thanks,
-Eric

On Friday, November 23, 2018, 6:07:04 AM CST, Sunil G
wrote:

Hi folks,

Thanks to all contributors who helped in this release [1]. I have created

first release candidate (RC0) for Apache Hadoop 3.2.0.

Artifacts for this RC are available here:

http://home.apache.org/~sunilg/hadoop-3.2.0-RC0/

RC tag in git is release-3.2.0-RC0.

The maven artifacts are available via repository.apache.org at

https://repository.apache.org/content/repositories/orgapachehadoop-1174/

This vote will run 7 days (5 weekdays), ending on Nov 30 at 11:59 pm PST.

3.2.0 contains 1079 [2] fixed JIRA issues since 3.1.0. Below feature
additions

are the highlights of this release.

1. Node Attributes Support in YARN

2. Hadoop Submarine project for running Deep Learning workloads on YARN

3. Support service upgrade via YARN Service API and CLI

4. HDFS Storage Policy Satisfier

5. Support Windows Azure Storage - Blob file system in Hadoop

6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a

7. Improvements in Router-based HDFS federation

Thanks to Wangda, Vinod, Marton for helping me in preparing the release.

I have done few testing with my pseudo cluster. My +1 to start.

Regards,

Sunil

[1]

https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E

[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
ORDER BY fixVersion ASC

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.2.0 - RC0

2018-11-28 Thread Eric Payne

Sunil,

This was introduced in branch-3.1, so it is NOT new behavior for release 3.2.0.
I am still tracking down the cause. I will open a JIRA once I do further
investigation if there is not one already.

This will be a problem for installations which use preemption and which use the
refreshQueues feature.

Thanks,
-Eric

On Wednesday, November 28, 2018, 11:47:06 AM CST, Eric Payne
wrote:

Sunil, thanks for all of the hard work on this release.

I have discovered that queue refresh doesn't work in some cases. For example,
when I change yarn.scheduler.capacity.root.default.disable_preemption, it
doesn't take effect unless I restart the RM.

I am still investigating, but I thought I should bring this up asap.

Thanks,
-Eric

On Friday, November 23, 2018, 6:07:04 AM CST, Sunil G
wrote:

Hi folks,

Thanks to all contributors who helped in this release [1]. I have created

first release candidate (RC0) for Apache Hadoop 3.2.0.

Artifacts for this RC are available here:

http://home.apache.org/~sunilg/hadoop-3.2.0-RC0/

RC tag in git is release-3.2.0-RC0.

The maven artifacts are available via repository.apache.org at

https://repository.apache.org/content/repositories/orgapachehadoop-1174/

This vote will run 7 days (5 weekdays), ending on Nov 30 at 11:59 pm PST.

3.2.0 contains 1079 [2] fixed JIRA issues since 3.1.0. Below feature
additions

are the highlights of this release.

1. Node Attributes Support in YARN

2. Hadoop Submarine project for running Deep Learning workloads on YARN

3. Support service upgrade via YARN Service API and CLI

4. HDFS Storage Policy Satisfier

5. Support Windows Azure Storage - Blob file system in Hadoop

6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a

7. Improvements in Router-based HDFS federation

Thanks to Wangda, Vinod, Marton for helping me in preparing the release.

I have done few testing with my pseudo cluster. My +1 to start.

Regards,

Sunil

[1]

https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E

[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
ORDER BY fixVersion ASC

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.9.2 (RC0)

2018-11-19 Thread Eric Payne

  +1 (binding)
-- Built from source-- Installed on 6-node pseudo cluster-- Tested intra- 
inter-queue preemption, user weights-- Ran streaming jobs, word count, and tara 
gen/sort tests
Thanks Akira for all of the hard work.-Eric Payne



 

On Tuesday, November 13, 2018, 7:02:51 PM CST, Akira Ajisaka 
 wrote:  
 
 Hi folks,

I have put together a release candidate (RC0) for Hadoop 2.9.2. It
includes 204 bug fixes and improvements since 2.9.1. [1]

The RC is available at http://home.apache.org/~aajisaka/hadoop-2.9.2-RC0/
Git signed tag is release-2.9.2-RC0 and the checksum is
826afbeae31ca687bc2f8471dc841b66ed2c6704
The maven artifacts are staged at
https://repository.apache.org/content/repositories/orgapachehadoop-1166/

You can find my public key at:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Please try the release and vote. The vote will run for 5 days.

[1] https://s.apache.org/2.9.2-fixed-jiras

Thanks,
Akira

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.1.1 - RC0

2018-08-07 Thread Eric Payne

Thanks Wangda for creating this release.

+1 (binding)
Tested:
- Built from source
- Deployed to 6-node, multi-tennant, unsecured pseudo cluster with hierarchical 
queue structure (CS)
- Refreshed queue (CS) properties
- Intra-queue preemption (CS)
- inter-queue preemption (CS)
- User weights (CS)

Issues:
- Inter-queue preemption seems to be preempting unnecessarily (flapping) when 
the queue balancing feature is enabled. This does not seem to be specific to 
this release.
- The preemption-to-balance-queue-after-satisfied.enabled property seems to 
always be enabled, but again, that is not specific to this release.


Eric


On Thursday, August 2, 2018, 1:44:22 PM CDT, Wangda Tan  
wrote: 





Hi folks,

I've created RC0 for Apache Hadoop 3.1.1. The artifacts are available here:

http://people.apache.org/~wangda/hadoop-3.1.1-RC0/

The RC tag in git is release-3.1.1-RC0:
https://github.com/apache/hadoop/commits/release-3.1.1-RC0

The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1139/

You can find my public key at
http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

This vote will run 5 days from now.

3.1.1 contains 435 [1] fixed JIRA issues since 3.1.0.

I have done testing with a pseudo cluster and distributed shell job. My +1
to start.

Best,
Wangda Tan

[1] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.1.1)
ORDER BY priority DESC

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits

2018-06-14 Thread Eric Payne (JIRA)



 [ 
https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-8425.
--
Resolution: Not A Bug

> Yarn container getting killed due to running beyond physical memory limits
> --
>
> Key: YARN-8425
> URL: https://issues.apache.org/jira/browse/YARN-8425
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: applications, container-queuing, yarn
>Affects Versions: 2.7.6
>Reporter: Tapas Sen
>Priority: Major
> Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, 
> yarn_configuration_3.PNG
>
>
> Hi,
> Getting these error.
>  
> 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1527758146858_45040_m_08_3: Container 
> [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is 
> running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical 
> memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container.
>  
> Yarn resource configuration will in attachment. 
>  
>  Any lead would be appreciated.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.3 (RC0)

2018-06-11 Thread Eric Payne

 Sorry, Yongjun. My +1 is also binding+1 (binding)-Eric Payne

On Friday, June 1, 2018, 12:25:36 PM CDT, Eric Payne 
 wrote:  
 
 

Thanks a lot, Yongjun, for your hard work on this release.

+1
- Built from source
- Installed on 6 node pseudo cluster


Tested the following in the Capacity Scheduler:
- Verified that running apps in labelled queues restricts tasks to the labelled 
nodes.
- Verified that various queue config properties for CS are refreshable
- Verified streaming jobs work as expected
- Verified that user weights work as expected
- Verified that FairOrderingPolicy in a CS queue will evenly assign resources
- Verified running yarn shell application runs as expected







On Friday, June 1, 2018, 12:48:26 AM CDT, Yongjun Zhang  
wrote: 





Greetings all,

I've created the first release candidate (RC0) for Apache Hadoop
3.0.3. This is our next maintenance release to follow up 3.0.2. It includes
about 249
important fixes and improvements, among which there are 8 blockers. See
https://issues.apache.org/jira/issues/?filter=12343997

The RC artifacts are available at:
https://dist.apache.org/repos/dist/dev/hadoop/3.0.3-RC0/

The maven artifacts are available via
https://repository.apache.org/content/repositories/orgapachehadoop-1126

Please try the release and vote; the vote will run for the usual 5 working
days, ending on 06/07/2018 PST time. Would really appreciate your
participation here.

I bumped into quite some issues along the way, many thanks to quite a few
people who helped, especially Sammi Chen, Andrew Wang, Junping Du, Eddy Xu.

Thanks,

--Yongjun

Re: [VOTE] Release Apache Hadoop 3.0.2 (RC1)

2018-04-18 Thread Eric Payne

 Thanks for all of your hard work to produce this release!
+1 (binding)
I tested the following, and all seems well:
- built from source
- brought up on pseudo-cluster with 4 NMs
- tested yarn shell with and without container-preserving restert
- tested Capacity Scheduler FairOrderingPolicy, with and without size-based 
weight
- tested user weights

Thanks!
Eric Payne



On Monday, April 16, 2018, 7:00:03 PM CDT, Lei Xu  wrote:  
 
 Hi, All

I've created release candidate RC-1 for Apache Hadoop 3.0.2, to
address missing source jars in the maven repository in RC-0.

Thanks Ajay Kumar for spotting the error.

Please note: this is an amendment for Apache Hadoop 3.0.1 release to
fix shaded jars in apache maven repository. The codebase of 3.0.2
release is the same as 3.0.1.  New bug fixes will be included in
Apache Hadoop 3.0.3 instead.

The release page is:
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release

New RC is available at: http://home.apache.org/~lei/hadoop-3.0.2-RC1/

The git tag is release-3.0.2-RC1, and the latest commit is
5c141f7c0f24c12cb8704a6ccc1ff8ec991f41ee, which is the same as RC-0.

The maven artifacts are available at:
https://repository.apache.org/content/repositories/orgapachehadoop-1102/

Please try the release, especially, *verify the maven artifacts*, and vote.

The vote will run 5 days, ending 4/21/2018.

Here is my +1.

Best,

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.2 (RC0)

2018-04-09 Thread Eric Payne

Thanks a lot for working to produce this release.

+1 (binding)
Tested the following:
- built from source and installed on 6-node pseudo-cluster
- tested Capacity Scheduler FairOrderingPolicy and FifoOrderingPolicy to 
determine that capacity was assigned as expected in each case
- tested user weights with FifoOrderingPolicy to ensure that weights were 
assigned to users as expected.

Eric Payne






On Friday, April 6, 2018, 1:17:10 PM CDT, Lei Xu  wrote: 





Hi, All

I've created release candidate RC-0 for Apache Hadoop 3.0.2.

Please note: this is an amendment for Apache Hadoop 3.0.1 release to
fix shaded jars in apache maven repository. The codebase of 3.0.2
release is the same as 3.0.1.  New bug fixes will be included in
Apache Hadoop 3.0.3 instead.

The release page is:
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release

New RC is available at: http://home.apache.org/~lei/hadoop-3.0.2-RC0/

The git tag is release-3.0.2-RC0, and the latest commit is
5c141f7c0f24c12cb8704a6ccc1ff8ec991f41ee

The maven artifacts are available at
https://repository.apache.org/content/repositories/orgapachehadoop-1096/

Please try the release, especially, *verify the maven artifacts*, and vote.

The vote will run 5 days, ending 4/11/2018.

Thanks for everyone who helped to spot the error and proposed fixes!

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org


-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.1.0 (RC1)

2018-04-03 Thread Eric Payne

+1 (binding)
Thanks Wangda for doing the work to produce this release.
I did the following to test the release:
- Built from source
- Installed on 6-node pseudo cluster
- Interacted with RM CLI and GUI
- Tested streaming jobs
- Tested yarn distributed shell jobs
- Tested Max AM Resource Percent

- Tested simple inter-queue preemption
- Tested priority first intra-queue preemption

- Tested userlimit first intra-queue preemption

Thanks,Eric Payne

===
On Thursday, March 29, 2018, 11:15:51 PM CDT, Wangda Tan
wrote:

Hi folks,

Thanks to the many who helped with this release since Dec 2017 [1]. We've
created RC1 for Apache Hadoop 3.1.0. The artifacts are available here:

http://people.apache.org/~wangda/hadoop-3.1.0-RC1

The RC tag in git is release-3.1.0-RC1. Last git commit SHA is
16b70619a24cdcf5d3b0fcf4b58ca77238ccbe6d

The maven artifacts are available via repository.apache.org at
https://repository.apache.org/content/repositories/orgapachehadoop-1090/
This vote will run 5 days, ending on Apr 3 at 11:59 pm Pacific.

3.1.0 contains 766 [2] fixed JIRA issues since 3.0.0. Notable additions
include the first class GPU/FPGA support on YARN, Native services, Support
rich placement constraints in YARN, S3-related enhancements, allow HDFS
block replicas to be provided by an external storage system, etc.

For 3.1.0 RC0 vote discussion, please see [3].

We’d like to use this as a starting release for 3.1.x [1], depending on how
it goes, get it stabilized and potentially use a 3.1.1 in several weeks as
the stable release.

We have done testing with a pseudo cluster:
- Ran distributed job.
- GPU scheduling/isolation.
- Placement constraints (intra-application anti-affinity) by using
distributed shell.

My +1 to start.

Best,
Wangda/Vinod

[1]
https://lists.apache.org/thread.html/b3fb3b6da8b6357a68513a6dfd104bc9e19e559aedc5ebedb4ca08c8@%3Cyarn-dev.hadoop.apache.org%3E
[2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.1.0)
AND fixVersion not in (3.0.0, 3.0.0-beta1) AND status = Resolved ORDER BY
fixVersion ASC
[3]
https://lists.apache.org/thread.html/b3a7dc075b7329fd660f65b48237d72d4061f26f83547e41d0983ea6@%3Cyarn-dev.hadoop.apache.org%3E

Re: [VOTE] Release Apache Hadoop 3.0.1 (RC1)

2018-03-20 Thread Eric Payne

 Thanks for working on this release!
+1 (binding)
I tested the following:
- yarn distributed shell job

- yarn streaming job

- inter-queue preemption

- compared behavior of fair and fifo ordering policy

- both userlimit_first mode and priority_first mode of intra-queue preemption

Eric Payne



On Saturday, March 17, 2018, 11:11:32 PM CDT, Lei Xu  
wrote:  
 
 Hi, all

I've created release candidate RC-1 for Apache Hadoop 3.0.1

Apache Hadoop 3.0.1 will be the first bug fix release for Apache
Hadoop 3.0 release. It includes 49 bug fixes and security fixes, which
include 12
blockers and 17 are critical.

Please note:
* HDFS-12990. Change default NameNode RPC port back to 8020. It makes
incompatible changes to Hadoop 3.0.0.  After 3.0.1 releases, Apache
Hadoop 3.0.0 will be deprecated due to this change.

The release page is:
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3.0+Release

New RC is available at: http://home.apache.org/~lei/hadoop-3.0.1-RC1/

The git tag is release-3.0.1-RC1, and the latest commit is
496dc57cc2e4f4da117f7a8e3840aaeac0c1d2d0

The maven artifacts are available at:
https://repository.apache.org/content/repositories/orgapachehadoop-1081/

Please try the release and vote; the vote will run for the usual 5
days, ending on 3/22/2017 6pm PST time.

Thanks!

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7947) Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps

2018-02-19 Thread Eric Payne (JIRA)

Eric Payne created YARN-7947:


 Summary: Capacity Scheduler intra-queue preemption can NPE for 
non-schedulable apps
 Key: YARN-7947
 URL: https://issues.apache.org/jira/browse/YARN-7947
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Reporter: Eric Payne


Intra-queue preemption policy can cause NPE for pending users with no 
schedulable apps.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7927) YARN-7813 caused test failure in TestRMWebServicesSchedulerActivities

2018-02-13 Thread Eric Payne (JIRA)

Eric Payne created YARN-7927:


 Summary: YARN-7813 caused test failure in 
TestRMWebServicesSchedulerActivities 
 Key: YARN-7927
 URL: https://issues.apache.org/jira/browse/YARN-7927
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-01-24 Thread Eric Payne (JIRA)

Eric Payne created YARN-7813:


 Summary: Capacity Scheduler Intra-queue Preemption should be 
configurable for each queue
 Key: YARN-7813
 URL: https://issues.apache.org/jira/browse/YARN-7813
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0, 2.8.3, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne


Just as inter-queue (a.k.a. cross-queue) preemption is configurable per queue, 
intra-queue (a.k.a. in-queue) preemption should be configurable per queue. If a 
queue does not have a setting for intra-queue preemption, it should inherit its 
parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2018-01-10 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-7424.
--
Resolution: Invalid

bq. In order to create the "desired" behavior, we would have to fundamentally 
change the way the capacity scheduler works,
Closing

> Capacity Scheduler Intra-queue preemption: add property to only preempt up to 
> configured MULP
> -
>
> Key: YARN-7424
> URL: https://issues.apache.org/jira/browse/YARN-7424
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> If the queue's configured minimum user limit percent (MULP) is something 
> small like 1%, all users will max out well over their MULP until 100 users 
> have apps in the queue. Since the intra-queue preemption monitor tries to 
> balance the resource among the users, most of the time in this use case it 
> will be preempting containers on behalf of users that are already over their 
> MULP guarantee.
> This JIRA proposes that a property should be provided so that a queue can be 
> configured to only preempt on behalf of a user until that user has reached 
> its MULP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-10 Thread Eric Payne (JIRA)

Eric Payne created YARN-7728:


 Summary: Expose and expand container preemptions in Capacity 
Scheduler queue metrics
 Key: YARN-7728
 URL: https://issues.apache.org/jira/browse/YARN-7728
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.8.3, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-1047 exposed queue metrics for the number of preempted containers to the 
fair scheduler. I would like to also expose these to the capacity scheduler and 
add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Resolved] (YARN-7658) Capacity scheduler UI hangs when rendering if labels are present

2017-12-14 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-7658.
--
Resolution: Duplicate

> Capacity scheduler UI hangs when rendering if labels are present
> 
>
> Key: YARN-7658
> URL: https://issues.apache.org/jira/browse/YARN-7658
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>    Reporter: Eric Payne
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7658) Capacity scheduler UI hangs when rendering if labels are present

2017-12-14 Thread Eric Payne (JIRA)

Eric Payne created YARN-7658:


 Summary: Capacity scheduler UI hangs when rendering if labels are 
present
 Key: YARN-7658
 URL: https://issues.apache.org/jira/browse/YARN-7658
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.7.5 (RC1)

2017-12-13 Thread Eric Payne

Thanks for the hard work on this release, Konstantin.
+1 (binding)
- Built from source
- Verified that refreshing of queues works as expected.

- Verified can run multiple users in a single queue
- Ran terasort test
- Verified that cross-queue preemption works as expected
Thanks. Eric Payne

  From: Konstantin Shvachko 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org"  
 Sent: Thursday, December 7, 2017 9:22 PM
 Subject: [VOTE] Release Apache Hadoop 2.7.5 (RC1)
   
Hi everybody,

I updated CHANGES.txt and fixed documentation links.
Also committed  MAPREDUCE-6165, which fixes a consistently failing test.

This is RC1 for the next dot release of Apache Hadoop 2.7 line. The
previous one 2.7.4 was release August 4, 2017.
Release 2.7.5 includes critical bug fixes and optimizations. See more
details in Release Note:
http://home.apache.org/~shv/hadoop-2.7.5-RC1/releasenotes.html

The RC0 is available at: http://home.apache.org/~shv/hadoop-2.7.5-RC1/

Please give it a try and vote on this thread. The vote will run for 5 days
ending 12/13/2017.

My up to date public key is available from:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Thanks,
--Konstantin

[jira] [Created] (YARN-7619) Max AM Resource value in CS UI is different for every user

2017-12-06 Thread Eric Payne (JIRA)

Eric Payne created YARN-7619:


 Summary: Max AM Resource value in CS UI is different for every user
 Key: YARN-7619
 URL: https://issues.apache.org/jira/browse/YARN-7619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 3.0.0-beta1, 2.9.0, 2.8.2, 3.1.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-7245 addressed the problem that the {{Max AM Resource}} in the capacity 
scheduler UI used to contain the queue-level AM limit instead of the user-level 
AM limit. It fixed this by using the user-specific AM limit that is calculated 
in {{LeafQueue#activateApplications}}, stored in each user's {{LeafQueue#User}} 
object, and retrieved via {{UserInfo#getResourceUsageInfo}}.

The problem is that this user-specific AM limit depends on the activity of 
other users and other applications in a queue, and it is only calculated and 
updated when a user's application is activated. So, when 
{{CapacitySchedulerPage}} retrieves the user-specific AM limit, it is a stale 
value unless an application was recently activated for a particular user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-12-04 Thread Eric Payne

+1. Thanks Sunil for the work on this branch.
Eric

From: Sunil G
To: "yarn-dev@hadoop.apache.org" ; Hdfs-dev
; Hadoop Common ;
"mapreduce-...@hadoop.apache.org"
Sent: Wednesday, November 29, 2017 7:56 PM
Subject: [VOTE] Merge Absolute resource configuration support in Capacity
Scheduler (YARN-5881) to trunk

Hi All,

Based on the discussion at [1], I'd like to start a vote to merge feature
branch

YARN-5881 to trunk. Vote will run for 7 days, ending Wednesday Dec 6 at
6:00PM PDT.

This branch adds support to configure queue capacity as absolute resource in

capacity scheduler. This will help admins who want fine control of
resources of queues.

Feature development is done at YARN-5881 [2], jenkins build is here
(YARN-7510 [3]).

All required tasks for this feature are committed. This feature changes
RM’s Capacity Scheduler only,

and we did extensive tests for the feature in the last couple of months
including performance tests.

Key points:

- The feature is turned off by default, and have to configure absolute
resource to enable same.

- Detailed documentation about how to use this feature is done as part of
[4].

- No major performance degradation is observed with this branch work. SLS
and UT performance

tests are done.

There were 11 subtasks completed for this feature.

Huge thanks to everyone who helped with reviews, commits, guidance, and

technical discussion/design, including Wangda Tan, Vinod Vavilapalli,
Rohith Sharma K S, Eric Payne .

[1] :
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhKhF1JCtR7ZFuZSEKQ4sBvN_n_tV5GHsbJ3YeyJP%2BP4Q%40mail.gmail.com%3E

[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7510

[4] : https://issues.apache.org/jira/browse/YARN-7533

Regards

Sunil and Wangda

[jira] [Created] (YARN-7575) When using absolute capacity configuration with no max capacity, scheduler UI NPEs and can't grow queue

2017-11-28 Thread Eric Payne (JIRA)

Eric Payne created YARN-7575:


 Summary: When using absolute capacity configuration with no max 
capacity, scheduler UI NPEs and can't grow queue
 Key: YARN-7575
 URL: https://issues.apache.org/jira/browse/YARN-7575
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Reporter: Eric Payne


I encountered the following while reviewing and testing branch YARN-5881.

The design document from YARN-5881 says that for max-capacity:
{quote}
3)  For each queue, we require:
a) if max-resource not set, it automatically set to parent.max-resource
{quote}

When I try leaving blank {{yarn.scheduler.capacity.< 
queue-path>.maximum-capacity}}, the RMUI scheduler page refuses to render. It 
looks like it's in {{CapacitySchedulerPage$ LeafQueueInfoBlock}}:
{noformat}
2017-11-28 11:29:16,974 [qtp43473566-220] ERROR webapp.Dispatcher: error 
handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
...
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderQueueCapacityInfo(CapacitySchedulerPage.java:164)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$LeafQueueInfoBlock.renderLeafQueueInfoWithoutParition(CapacitySchedulerPage.java:129)
{noformat}

Also... A job will run in the leaf queue with no max capacity set and it will 
grow to the max capacity of the cluster, but if I add resources to the node, 
the job won't grow any more even though it has pending resources.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7501) Capacity Scheduler Intra-queue preemption should have a "dead zone" around user limit

2017-11-15 Thread Eric Payne (JIRA)

Eric Payne created YARN-7501:


 Summary: Capacity Scheduler Intra-queue preemption should have a 
"dead zone" around user limit
 Key: YARN-7501
 URL: https://issues.apache.org/jira/browse/YARN-7501
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-beta1, 2.9.0, 2.8.2, 3.1.0
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7496) CS Intra-queue preemption user-limit calculations are not in line with LeafQueue user-limit calculations

2017-11-14 Thread Eric Payne (JIRA)

Eric Payne created YARN-7496:


 Summary: CS Intra-queue preemption user-limit calculations are not 
in line with LeafQueue user-limit calculations
 Key: YARN-7496
 URL: https://issues.apache.org/jira/browse/YARN-7496
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.2
Reporter: Eric Payne
Assignee: Eric Payne


Only a problem in 2.8.

Preemption could oscillate due to the difference in how user limit is 
calculated between 2.8 and later releases.

Basically (ignoring ULF, MULP, and maybe others), the calculation for user 
limit on the Capacity Scheduler side in 2.8 is {{total used resources / number 
of active users}} while the calculation in later releases is {{total active 
resources / number of active users}}. When intra-queue preemption was 
backported to 2.8, it's calculations for user limit were more aligned with the 
latter algorithm, which is in 2.9 and later releases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7469) Capacity Scheduler Intra-queue preemption: User can starve if newest app is exactly at user limit

2017-11-09 Thread Eric Payne (JIRA)

Eric Payne created YARN-7469:


 Summary: Capacity Scheduler Intra-queue preemption: User can 
starve if newest app is exactly at user limit
 Key: YARN-7469
 URL: https://issues.apache.org/jira/browse/YARN-7469
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 3.0.0-beta1, 2.9.0, 2.8.2
Reporter: Eric Payne
Assignee: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2017-10-31 Thread Eric Payne (JIRA)

Eric Payne created YARN-7424:


 Summary: Capacity Scheduler Intra-queue preemption: add property 
to only preempt up to configured MULP
 Key: YARN-7424
 URL: https://issues.apache.org/jira/browse/YARN-7424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-beta1, 2.8.2
Reporter: Eric Payne







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Eric Payne

+1 (binding)
Thanks a lot, Junping!
I built and installed the source on a 6-node pseudo cluster. I simple sleep and 
streaming jobs that exercised intra-queue and inter-queue preemption, and used 
user weights.
-Eric

  From: Junping Du 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org"  
 Sent: Thursday, October 19, 2017 7:43 PM
 Subject: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

Hi folks,
    I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.

    Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line and will 
be the latest stable/production release for Apache Hadoop - it includes 315 new 
fixed issues since 2.8.1 and 69 fixes are marked as blocker/critical issues.

      More information about the 2.8.2 release plan can be found here: 
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release

      New RC is available at: 
http://home.apache.org/~junping_du/hadoop-2.8.2-RC1

      The RC tag in git is: release-2.8.2-RC1, and the latest commit id is: 
66c47f2a01ad9637879e95f80c41f798373828fb

      The maven artifacts are available via 
repository.apache.org at: 
https://repository.apache.org/content/repositories/orgapachehadoop-1064

      Please try the release and vote; the vote will run for the usual 5 days, 
ending on 10/24/2017 6pm PST time.

Thanks,

Junping

[jira] [Created] (YARN-7370) Intra-queue preemption properties should be refreshable

2017-10-19 Thread Eric Payne (JIRA)

Eric Payne created YARN-7370:


 Summary: Intra-queue preemption properties should be refreshable
 Key: YARN-7370
 URL: https://issues.apache.org/jira/browse/YARN-7370
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-alpha3, 2.8.0
Reporter: Eric Payne


At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
should be refreshable. It would also be nice to make 
{{intra-queue-preemption.enabled}} and {{preemption-order-policy}} refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-02 Thread Eric Payne

+1 (binding)
Thanks Andrew for all of your very diligent efforts.
Build from source, installed on a 6-node pseudo cluster, and successfully 
tested the following manual use cases:
o MapReduce sleep job
o MapReduce streaming jobœ
o Cross-queue (inter-queue) preemption 
o In-queue (intra-queue) preemption, both USERFIRST and PRIORITYFIRST
o User weights not equals to 1
o User weights in conjunction with in-queue preemption.
-Eric Payne

  From: Andrew Wang 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org"  
 Sent: Thursday, September 28, 2017 7:04 PM
 Subject: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0
   
Hi all,

Let me start, as always, by thanking the many, many contributors who helped
with this release! I've prepared an RC0 for 3.0.0-beta1:

http://home.apache.org/~wang/3.0.0-beta1-RC0/

This vote will run five days, ending on Nov 3rd at 5PM Pacific.

beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
improvements, and feature enhancements. Notable additions include the
addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
shaded client, and HDFS erasure coding pluggable policy support.

I've done the traditional testing of running a Pi job on a pseudo cluster.
My +1 to start.

We're working internally on getting this run through our integration test
rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.

Best,
Andrew

[jira] [Created] (YARN-7245) In Cap Sched UI, Max AM Resource column in Active Users Info section should be per-user

2017-09-22 Thread Eric Payne (JIRA)

Eric Payne created YARN-7245:


 Summary: In Cap Sched UI, Max AM Resource column in Active Users 
Info section should be per-user
 Key: YARN-7245
 URL: https://issues.apache.org/jira/browse/YARN-7245
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 3.0.0-alpha4, 2.8.1, 2.9.0
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7149) Cross-queue preemption sometimes starves an underserved queue

2017-09-01 Thread Eric Payne (JIRA)

Eric Payne created YARN-7149:


 Summary: Cross-queue preemption sometimes starves an underserved 
queue
 Key: YARN-7149
 URL: https://issues.apache.org/jira/browse/YARN-7149
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.0.0-alpha3, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne


In branch 2 and trunk, I am consistently seeing some use cases where 
cross-queue preemption does not happen when it should. I do not see this in 
branch-2.8.

Use Case:
| | *Size* | *Minimum Container Size* |
|MyCluster | 20 GB | 0.5 GB |

| *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit Percent 
(MULP)* | *User Limit Factor (ULF)* |
|Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
|Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |

- {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
- {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
- _Note: containers are 0.5 GB._
- Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
- Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
- _No more containers are ever preempted, even though {{Q2}} is far underserved_




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7120) CapacitySchedulerPage NPE in "Aggregate scheduler counts" section

2017-08-29 Thread Eric Payne (JIRA)

Eric Payne created YARN-7120:


 Summary: CapacitySchedulerPage NPE in "Aggregate scheduler counts" 
section
 Key: YARN-7120
 URL: https://issues.apache.org/jira/browse/YARN-7120
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3, 2.8.1, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


The problem manifests itself by having the bottom part of the "Aggregated 
scheduler counts" section cut off on the GUI and an NPE in the RM log.
{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$HealthBlock.render(CapacitySchedulerPage.java:558)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet.__(Hamlet.java:30354)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.CapacitySchedulerPage$QueuesBlock.render(CapacitySchedulerPage.java:478)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69)
at 
org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79)
at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
at 
org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl$EImp._v(HamletImpl.java:117)
at org.apache.hadoop.yarn.webapp.hamlet2.Hamlet$TD.__(Hamlet.java:848)
at 
org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:71)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:86)
... 58 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7052) RM SchedulingMonitor should use HadoopExecutors when creating ScheduledExecutorService

2017-08-18 Thread Eric Payne (JIRA)

Eric Payne created YARN-7052:


 Summary: RM SchedulingMonitor should use HadoopExecutors when 
creating ScheduledExecutorService
 Key: YARN-7052
 URL: https://issues.apache.org/jira/browse/YARN-7052
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Reporter: Eric Payne


In YARN-7051, we ran into a case where the preemption monitor thread hung with 
no indication of why. This was because the preemption monitor is started by the 
{{SchedulingExecutorService}} from {{SchedulingMonigor#serviceStart}}, and then 
nothing ever gets the result of the future or allows it to throw an exception 
if needed.

At least with {{HadoopExecutor}}, it will provide a 
{{HadoopScheduledThreadPoolExecutor}} that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-7051) FifoIntraQueuePreemptionPlugin can get concurrent modification exception/

2017-08-18 Thread Eric Payne (JIRA)

Eric Payne created YARN-7051:


 Summary: FifoIntraQueuePreemptionPlugin can get concurrent 
modification exception/
 Key: YARN-7051
 URL: https://issues.apache.org/jira/browse/YARN-7051
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3, 2.8.1, 2.9.0
Reporter: Eric Payne
Priority: Critical


{{FifoIntraQueuePreemptionPlugin#calculateUsedAMResourcesPerQueue}} has the 
following code:
{code}
Collection runningApps = leafQueue.getApplications();
Resource amUsed = Resources.createResource(0, 0);

for (FiCaSchedulerApp app : runningApps) {
{code}
{{runningApps}} is unmodifiable but not concurrent. This caused the preemption 
monitor thread to crash in the RM in one of our clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.7.4 (RC0)

2017-08-02 Thread Eric Payne

+1 (binding)
Tested the following:




- Application History Server

-- Apps can be observed from UI
-- App and container metadata can be retrieved via REST APIs



- RM UI

-- Can kill an app from the RM UI


- Apps run in different frameworks. Frameworks tested: MR and yarn shell

-- In yarn shell framework, containers are preserved across AM restart.


- Cross-queue preemption (Inter-queue):

-- Inter-queue preemption will preempt the correct number of containers from a p
reemptable queue.

-- Inter-queue preemption will not preempt from queues with preemption disabled.


- Labeled queues work as expected where apps assigned to a queue that has a spec
ific label will run only on labeled nodes.





From: Konstantin Shvachko 
To: Chris Douglas  
Cc: Andrew Wang ; Allen Wittenauer 
; "common-...@hadoop.apache.org" 
; "hdfs-...@hadoop.apache.org" 
; "mapreduce-...@hadoop.apache.org" 
; "yarn-dev@hadoop.apache.org" 

Sent: Monday, July 31, 2017 8:57 PM
Subject: Re: [VOTE] Release Apache Hadoop 2.7.4 (RC0)



Uploaded new binaries hadoop-2.7.4-RC0.tar.gz, which adds lib/native/.
Same place: http://home.apache.org/~shv/hadoop-2.7.4-RC0/

Thanks,
--Konstantin


On Mon, Jul 31, 2017 at 3:56 PM, Chris Douglas  wrote:

> On Mon, Jul 31, 2017 at 3:02 PM, Konstantin Shvachko
>  wrote:
> > For the packaging, here is the exact phrasing from the sited
> release-policy
> > document relevant to binaries:
> > "As a convenience to users that might not have the appropriate tools to
> > build a compiled version of the source, binary/bytecode packages MAY be
> > distributed alongside official Apache releases. In all such cases, the
> > binary/bytecode package MUST have the same version number as the source
> > release and MUST only add binary/bytecode files that are the result of
> > compiling that version of the source code release and its dependencies."
> > I don't think my binary package violates any of these.
>
> +1 The PMC VOTE applies to source code, only. If someone wants to
> rebuild the binary tarball with native libs and replace this one,
> that's fine.
>
> My reading of the above is that source code must be distributed with
> binaries, not that we omit the source code from binary releases... -C
>
> > But I'll upload an additional tar.gz with native bits and no src, as you
> > guys requested.
> > Will keep it as RC0 as there is no source code change and it comes from
> the
> > same build.
> > Hope this is satisfactory.
> >
> > Thanks,
> > --Konstantin
> >
> > On Mon, Jul 31, 2017 at 1:53 PM, Andrew Wang 
> > wrote:
> >
> >> I agree with Brahma on the two issues flagged (having src in the binary
> >> tarball, missing native libs). These are regressions from prior
> releases.
> >>
> >> As an aside, "we release binaries as a convenience" doesn't relax the
> >> quality bar. The binaries are linked on our website and distributed
> through
> >> official Apache channels. They have to adhere to Apache release
> >> requirements. And, most users consume our work via Maven dependencies,
> >> which are binary artifacts.
> >>
> >> http://www.apache.org/legal/release-policy.html goes into this in more
> >> detail. A release must minimally include source packages, and can also
> >> include binary artifacts.
> >>
> >> Best,
> >> Andrew
> >>
> >> On Mon, Jul 31, 2017 at 12:30 PM, Konstantin Shvachko <
> >> shv.had...@gmail.com> wrote:
> >>
> >>> To avoid any confusion in this regard. I built RC0 manually in
> compliance
> >>> with Apache release policy
> >>> http://www.apache.org/legal/release-policy.html
> >>> I edited the HowToReleasePreDSBCR page to make sure people don't use
> >>> Jenkins option for building.
> >>>
> >>> A side note. This particular build is broken anyways, so no worries
> there.
> >>> I think though it would be useful to have it working for testing and
> as a
> >>> packaging standard.
> >>>
> >>> Thanks,
> >>> --Konstantin
> >>>
> >>> On Mon, Jul 31, 2017 at 11:40 AM, Allen Wittenauer <
> >>> a...@effectivemachines.com
> >>> > wrote:
> >>>
> >>> >
> >>> > > On Jul 31, 2017, at 11:20 AM, Konstantin Shvachko <
> >>> shv.had...@gmail.com>
> >>> > wrote:
> >>> > >
> >>> > > https://wiki.apache.org/hadoop/HowToReleasePreDSBCR
> >>> >
> >>> > FYI:
> >>> >
> >>> > If you are using ASF Jenkins to create an ASF release
> >>> > artifact, it's pretty much an automatic vote failure as any such
> >>> release is
> >>> > in violation of ASF policy.
> >>> >
> >>> >
> >>>
> >>
> >>
>

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

2017-07-05 Thread Eric Payne

Thanks Andrew.
I downloaded the source, built it, and installed it onto a pseudo distributed 
4-node cluster. 

I ran mapred and streaming test cases, including sleep and wordcount.
+1 (non-binding)
-Eric

  From: Andrew Wang 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org"  
 Sent: Thursday, June 29, 2017 9:41 PM
 Subject: [VOTE] Release Apache Hadoop 3.0.0-alpha4-RC0

Hi all,

As always, thanks to the many, many contributors who helped with this
release! I've prepared an RC0 for 3.0.0-alpha4:

http://home.apache.org/~wang/3.0.0-alpha4-RC0/

The standard 5-day vote would run until midnight on Tuesday, July 4th.
Given that July 4th is a holiday in the US, I expect this vote might have
to be extended, but I'd like to close the vote relatively soon after.

I've done my traditional testing of a pseudo-distributed cluster with a
single task pi job, which was successful.

Normally my testing would end there, but I'm slightly more confident this
time. At Cloudera, we've successfully packaged and deployed a snapshot from
a few days ago, and run basic smoke tests. Some bugs found from this
include HDFS-11956, which fixes backwards compat with Hadoop 2 clients, and
the revert of HDFS-11696, which broke NN QJM HA setup.

Vijay is working on a test run with a fuller test suite (the results of
which we can hopefully post soon).

My +1 to start,

Best,
Andrew

Pre commit builds seem broken

2017-06-30 Thread Eric Payne

Hey does anyone know why the precommit builds seem broken? I try to kick one 
and it hangs for a long time and then produces the following:

#16288 (pending—H14 doesn’t have label Hadoop&&!H9&&!H5&&!H6; H15 doesn’t 
have label Hadoop&&!H9&&!H5&&!H6; H16 doesn’t have label Hadoop&&!H9&&!H5&&!H6; 
H17 doesn’t have label Hadoop&&!H9&&!H5&&!H6; H18 doesn’t have label 
Hadoop&&!H9&&!H5&&!H6; H19 doesn’t have label Hadoop&&!H9&&!H5&&!H6; H20 
doesn’t have label Hadoop&&!H9&&!H5&&!H6; H21 doesn’t have label 
Hadoop&&!H9&&!H5&&!H6; H22 doesn’t have label Hadoop&&!H9&&!H5&&!H6; H24 
doesn’t have label Hadoop&&!H9&&!H5&&!H6; H5 doesn’t have label 
Hadoop&&!H9&&!H5&&!H6; H6 doesn’t have label Hadoop&&!H9&&!H5&&!H6; H9 doesn’t 
have label Hadoop&&!H9&&!H5&&!H6; Waiting for pending items to get a node 
assigned; beam1 doesn’t have label Hadoop&&!H9&&!H5&&!H6; beam2 doesn’t have 
label Hadoop&&!H9&&!H5&&!H6; beam3 doesn’t have label Hadoop&&!H9&&!H5&&!H6; 
beam4 doesn’t have label Hadoop&&!H9&&!H5&&!H6; beam5 doesn’t have label 
Hadoop&&!H9&&!H5&&!H6; beam6 doesn’t have label Hadoop&&!H9&&!H5&&!H6; beam7 
doesn’t have label Hadoop&&!H9&&!H5&&!H6; beam8 doesn’t have label 
Hadoop&&!H9&&!H5&&!H6; cassandra1 doesn’t have label Hadoop&&!H9&&!H5&&!H6; 
cassandra10 doesn’t have label Hadoop&&!H9&&!H5&&!H6; cassan

[jira] [Created] (YARN-6585) RM fails to start when upgrading from 2.7 to 2.8 for clusters with node labels.

2017-05-11 Thread Eric Payne (JIRA)

Eric Payne created YARN-6585:


 Summary: RM fails to start when upgrading from 2.7 to 2.8 for 
clusters with node labels.
 Key: YARN-6585
 URL: https://issues.apache.org/jira/browse/YARN-6585
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Payne


{noformat}
Caused by: java.io.IOException: Not all labels being replaced contained by 
known label collections, please check, new labels=[abc]
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.checkReplaceLabelsOnNode(CommonNodeLabelsManager.java:718)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.replaceLabelsOnNode(CommonNodeLabelsManager.java:737)
at 
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.replaceLabelsOnNode(RMNodeLabelsManager.java:189)
at 
org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.loadFromMirror(FileSystemNodeLabelsStore.java:181)
at 
org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:208)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:251)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:265)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
... 13 more
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6248) Killing an app with pending container requests leaves the user in UsersManager

2017-02-27 Thread Eric Payne (JIRA)

Eric Payne created YARN-6248:


 Summary: Killing an app with pending container requests leaves the 
user in UsersManager
 Key: YARN-6248
 URL: https://issues.apache.org/jira/browse/YARN-6248
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0-alpha3
Reporter: Eric Payne
Assignee: Eric Payne


If an app is still asking for resources when it is killed, the user is left in 
the UsersManager structure and shows up on the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-6165) Intra-queue preemption occurs even when preemption is turned off for a specific queue.

2017-02-09 Thread Eric Payne (JIRA)

Eric Payne created YARN-6165:


 Summary: Intra-queue preemption occurs even when preemption is 
turned off for a specific queue.
 Key: YARN-6165
 URL: https://issues.apache.org/jira/browse/YARN-6165
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-alpha2
Reporter: Eric Payne


Intra-queue preemption occurs even when preemption is turned on for the whole 
cluster ({{yarn.resourcemanager.scheduler.monitor.enable == true}}) but turned 
off for a specific queue 
({{yarn.scheduler.capacity.root.queue1.disable_preemption == true}}).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

[jira] [Created] (YARN-5973) TestCapacitySchedulerSurgicalPreemption sometimes fails

2016-12-06 Thread Eric Payne (JIRA)

Eric Payne created YARN-5973:


 Summary: TestCapacitySchedulerSurgicalPreemption sometimes fails
 Key: YARN-5973
 URL: https://issues.apache.org/jira/browse/YARN-5973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, scheduler preemption
Affects Versions: 2.8.0
Reporter: Eric Payne
Priority: Minor


The tests in {{TestCapacitySchedulerSurgicalPreemption}} appear to be racy. 
They often pass, but  the following errors sometimes occur:
{noformat}
testSimpleSurgicalPreemption(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
  Time elapsed: 14.671 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.fail(Assert.java:95)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerPreemptionTestBase.waitNumberOfLiveContainersFromApp(CapacitySchedulerPreemptionTestBase.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSimpleSurgicalPreemption(TestCapacitySchedulerSurgicalPreemption.java:143)
{noformat}
{noformat}
testSurgicalPreemptionWithAvailableResource(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption)
  Time elapsed: 9.503 sec  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<2>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerSurgicalPreemption.testSurgicalPreemptionWithAvailableResource(TestCapacitySchedulerSurgicalPreemption.java:220)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: Updated 2.8.0-SNAPSHOT artifact

2016-11-10 Thread Eric Payne

How do we come to a resolution regarding whether or not re-cut branch-2.8 or 
release it as it is (after fixing blockers)?
There are some things in branch-2 that I would like to pull back into 
branch-2.8, so a resolution to this question will affect how I proceed.
Thanks,-Eric
  From: Karthik Kambatla 
 To: Ming Ma  
Cc: Sangjin Lee ; Jason Lowe ; Akira 
Ajisaka ; Brahma Reddy Battula 
; Vinod Kumar Vavilapalli ; 
"common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org" 
 Sent: Thursday, November 10, 2016 1:56 AM
 Subject: Re: Updated 2.8.0-SNAPSHOT artifact
   
If there is interest in releasing off of branch-2.8, we should definitely
do that. As Sangjin mentioned, there might be value in doing 2.9 off
branch-2 too.

How do we go about maintenance releases along those minor lines, and when
would we discontinue 2.6.x/2.7.x releases?

On Wed, Nov 9, 2016 at 12:06 PM, Ming Ma  wrote:

> I would also prefer releasing current 2.8 branch sooner. There are several
> incomplete features in branch-2 such as YARN-914 and HDFS-7877 that are
> better served if we can complete them in the next major release. Letting
> them span across multiple releases might not be desirable as there could be
> some potential compatibility issues involved. Therefore if we recut 2.8 it
> means we have to work on those items before the new 2.8 is released which
> could cause major delay on the schedule.
>
> On Mon, Nov 7, 2016 at 10:37 AM, Sangjin Lee  wrote:
>
>> +1. Resetting the 2.8 effort and the branch at this point may be
>> counter-productive. IMO we should focus on resolving the remaining
>> blockers
>> and getting it out the door. I also think that we should seriously
>> consider
>> 2.9 as well, as a fairly large number of changes have accumulated in
>> branch-2 (over branch-2.8).
>>
>>
>> Sangjin
>>
>> On Fri, Nov 4, 2016 at 3:38 PM, Jason Lowe 
>> wrote:
>>
>> > At this point my preference would be to do the most expeditious thing to
>> > release 2.8, whether that's sticking with the branch-2.8 we have today
>> or
>> > re-cutting it on branch-2.  Doing a quick JIRA query, there's been
>> almost
>> > 2,400 JIRAs resolved in 2.8.0 (1).  For many of them, it's well-past
>> time
>> > they saw a release vehicle.  If re-cutting the branch means we have to
>> wrap
>> > up a few extra things that are still in-progress on branch-2 or add a
>> few
>> > more blockers to the list before we release then I'd rather stay where
>> > we're at and ship it ASAP.
>> >
>> > Jason
>> > (1) https://issues.apache.org/jira/issues/?jql=project%20in%
>> > 20%28hadoop%2C%20yarn%2C%20mapreduce%2C%20hdfs%29%
>> > 20and%20resolution%20%3D%20Fixed%20and%20fixVersion%20%3D%202.8.0
>> >
>> >
>> >
>> >
>> >
>> >    On Tuesday, October 25, 2016 5:31 PM, Karthik Kambatla <
>> > ka...@cloudera.com> wrote:
>> >
>> >
>> >  Is there value in releasing current branch-2.8? Aren't we better off
>> > re-cutting the branch off of branch-2?
>> >
>> > On Tue, Oct 25, 2016 at 12:20 AM, Akira Ajisaka <
>> > ajisa...@oss.nttdata.co.jp>
>> > wrote:
>> >
>> > > It's almost a year since branch-2.8 has cut.
>> > > I'm thinking we need to release 2.8.0 ASAP.
>> > >
>> > > According to the following list, there are 5 blocker and 6 critical
>> > issues.
>> > > https://issues.apache.org/jira/issues/?filter=12334985
>> > >
>> > > Regards,
>> > > Akira
>> > >
>> > >
>> > > On 10/18/16 10:47, Brahma Reddy Battula wrote:
>> > >
>> > >> Hi Vinod,
>> > >>
>> > >> Any plan on first RC for branch-2.8 ? I think, it has been long time.
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> --Brahma Reddy Battula
>> > >>
>> > >> -Original Message-
>> > >> From: Vinod Kumar Vavilapalli [mailto:vino...@apache.org]
>> > >> Sent: 20 August 2016 00:56
>> > >> To: Jonathan Eagles
>> > >> Cc: common-...@hadoop.apache.org
>> > >> Subject: Re: Updated 2.8.0-SNAPSHOT artifact
>> > >>
>> > >> Jon,
>> > >>
>> > >> That is around the time when I branched 2.8, so I guess you were
>> getting
>> > >> SNAPSHOT artifacts till then from the branch-2 nightly builds.
>> > >>
>> > >> If you need it, we can set up SNAPSHOT builds. Or just wait for the
>> > first
>> > >> RC, which is around the corner.
>> > >>
>> > >> +Vinod
>> > >>
>> > >> On Jul 28, 2016, at 4:27 PM, Jonathan Eagles 
>> wrote:
>> > >>>
>> > >>> Latest snapshot is uploaded in Nov 2015, but checkins are still
>> coming
>> > >>> in quite frequently.
>> > >>> https://repository.apache.org/content/repositories/snapshots
>> /org/apach
>> > >>> e/hadoop/hadoop-yarn-api/
>> > >>>
>> > >>> Are there any plans to start producing updated SNAPSHOT artifacts
>> for
>> > >>> current hadoop development lines?
>> > >>>
>> > >>
>> > >>
>> > >> 
>> -
>> > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> > >>
>> > >>
>> > >> --

[jira] [Created] (YARN-5555) Scheduler UI: "% of Queue" is inaccurate if leaf queue is hierarchically nested.

2016-08-23 Thread Eric Payne (JIRA)

Eric Payne created YARN-:


 Summary: Scheduler UI: "% of Queue" is inaccurate if leaf queue is 
hierarchically nested.
 Key: YARN-
 URL: https://issues.apache.org/jira/browse/YARN-
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


If a leaf queue is hierarchically nested (e.g., {{root.a.a1}}, {{root.a.a2}}), 
the values in the "*% of Queue*" column in the apps section of the Scheduler UI 
is calculated as if the leaf queue ({{a1}}) were a direct child of {{root}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Re: [VOTE] Release Apache Hadoop 2.7.3 RC0

2016-07-26 Thread Eric Payne

+1 (non-binding)
Thanks, Vinod, for all of your hard work and congratulations in completing this 
release.
After downloading and building the source, I installed Hadoop 2.7.3 RC0 on a 
3-node, multi-tenant, insecure cluster. I ran manual tests to ensure the 
following:
- Ensure that user limit percent is honored for multiple users in the same queue
- Ensure that cross-queue preemption occurs when a preemptable queue is over 
its guaranteed capacity
- Ensure that jobs submitted to labelled queues only run on the labelled nodes.
- Ensure that RM UI shows correct queue resource metrics when jobs are running 
in labelled queues (YARN-4751).
- Ensure that a yarn distributed shell application can be launched and complete 
successfully.

Eric Payne
  From: Vinod Kumar Vavilapalli 
 To: "common-...@hadoop.apache.org" ; 
hdfs-...@hadoop.apache.org; yarn-dev@hadoop.apache.org; 
"mapreduce-...@hadoop.apache.org"  
Cc: Vinod Kumar Vavilapalli 
 Sent: Friday, July 22, 2016 9:15 PM
 Subject: [VOTE] Release Apache Hadoop 2.7.3 RC0
   
Hi all,

I've created a release candidate RC0 for Apache Hadoop 2.7.3.

As discussed before, this is the next maintenance release to follow up 2.7.2.

The RC is available for validation at: 
http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/ 
<http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/>

The RC tag in git is: release-2.7.3-RC0

The maven artifacts are available via repository.apache.org 
<http://repository.apache.org/> at 
https://repository.apache.org/content/repositories/orgapachehadoop-1040/ 
<https://repository.apache.org/content/repositories/orgapachehadoop-1040/>

The release-notes are inside the tar-balls at location 
hadoop-common-project/hadoop-common/src/main/docs/releasenotes.html. I hosted 
this at http://home.apache.org/~vinodkv/hadoop-2.7.3-RC0/releasenotes.html 
<http://people.apache.org/~vinodkv/hadoop-2.7.2-RC1/releasenotes.html> for your 
quick perusal.

As you may have noted, a very long fix-cycle for the License & Notice issues 
(HADOOP-12893) caused 2.7.3 (along with every other Hadoop release) to slip by 
quite a bit. This release's related discussion thread is linked below: [1].

Please try the release and vote; the vote will run for the usual 5 days.

Thanks,
Vinod

[1]: 2.7.3 release plan: 
https://www.mail-archive.com/hdfs-dev%40hadoop.apache.org/msg24439.html 
<http://markmail.org/thread/6yv2fyrs4jlepmmr>

[jira] [Created] (YARN-4751) In 2.7, Labeled queue usage not shown properly in capacity scheduler UI

2016-03-01 Thread Eric Payne (JIRA)

Eric Payne created YARN-4751:


 Summary: In 2.7, Labeled queue usage not shown properly in 
capacity scheduler UI
 Key: YARN-4751
 URL: https://issues.apache.org/jira/browse/YARN-4751
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 2.7.3
Reporter: Eric Payne
Assignee: Eric Payne


In 2.6 and 2.7, the capacity scheduler UI does not have the queue graphs 
separated by partition. When applications are running on a labeled queue, no 
color is shown in the bar graph, and several of the "Used" metrics are zero.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-08 Thread Eric Payne

Naganarasimha Garla, thanks for the reply.

Yes, I used the node ID. I did not include a port. Here are the steps I used, 
which work for me on 2.7:

- yarn rmadmin -addToClusterNodeLabels abc
- yarn rmadmin -replaceLabelsOnNode hostname.company.com=abc
- configure queue properties as appropriate
- yarn rmadmin -refreshQueues

As I say, this works for me when I try it on 2.7 and later. It's probably 
something with my environment. I will continue to look into it.

Thanks for your help
-Eric



From: Naganarasimha Garla 
To: mapreduce-...@hadoop.apache.org; Eric Payne  
Cc: "common-...@hadoop.apache.org" ; 
"hdfs-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org" 
Sent: Monday, February 8, 2016 1:01 PM
Subject: Re: [VOTE] Release Apache Hadoop 2.6.4 RC0



+1 (non binding)

* Downloaded hadoop-2.6.4-RC0-src.tar.gz- built from source both package, 
install, and verified the MD5 checksum

* Did a Pseudo cluster and tested basic hdfs operations
* Ran sleep job and Pi job
* Added node label and ran job under the label by configuring 
default-node-label-expression and it ran fine

Eric Payne,
Hope you tried adding/replacing the labels using NodeId/Node Address and not 
the HTTP address!
I executed the following command to configure the label and node
  "./yarn rmadmin -replaceLabelsOnNode  "localhost:43795,test1"  "
After this was able to submit the job for a label

Regards,
+ Naga


On Mon, Feb 8, 2016 at 11:06 PM, Eric Payne  
wrote:

Hi Junping Du. Thank you for your work preparing this release.
>I did the following things to test release Hadoop 2.6.4 rc0:- Downloaded 
>hadoop-2.6.4-RC0-src.tar.gz- built from source both package, install, and 
>eclipse:eclipse- Set up a 3-node, unsecured cluster with 3 queues, one of 
>which has preemption enabled- Ran a successful test to ensure that preemption 
>would happen to containers on the preemptable queue if they were needed for an 
>application on another queue.- Ran successful streaming and yarn shell tests
>Junping, I did have a concern about labelled nodes and queues. Is full label 
>support backported to 2.6.4? I see that the syntax for the rmadmin command 
>lists label commands like -addToClusterNodeLabels and -replaceLabelsOnNode. I 
>was able to add a label (using -addToClusterNodeLabels) and I was able to 
>define a queue whose accessible node label was listed with my specified label. 
>However, when I tried to set the node label to a specific node using 
>-replaceLabelsOnNode, the label does not show up on the specified node in 
>cluster nodes UI (http://RM:8088/cluster/nodes). I also confirmed that 
>submitting a job to the labelled queue gets accepted but never runs, which is 
>the behavior I would expect if no node had the specified label. I will also 
>add that this procedure works fine in 2.7.
>Thanks,-Eric Payne
>
>  From: Junping Du 
> To: "hdfs-...@hadoop.apache.org" ; 
> "yarn-dev@hadoop.apache.org" ; 
> "mapreduce-...@hadoop.apache.org" ; 
> "common-...@hadoop.apache.org" 
> Sent: Wednesday, February 3, 2016 1:01 AM
> Subject: [VOTE] Release Apache Hadoop 2.6.4 RC0
>
>
>Hi community folks,
>  I've created a release candidate RC0 for Apache Hadoop 2.6.4 (the next 
> maintenance release to follow up 2.6.3.) according to email thread of release 
> plan 2.6.4 [1]. Below is details of this release candidate:
>
>The RC is available for validation at:
>*http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/
><http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/>*
>
>The RC tag in git is: release-2.6.4-RC0
>
>The maven artifacts are staged via repository.apache.org at:
>*https://repository.apache.org/content/repositories/orgapachehadoop-1028/?
><https://repository.apache.org/content/repositories/orgapachehadoop-1028/>*
>
>You can find my public key at:
>http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS
>
>Please try the release and vote. The vote will run for the usual 5 days.
>
>Thanks!
>
>
>Cheers,
>
>Junping
>
>
>[1]: 2.6.4 release plan: http://markmail.org/message/fk3ud3c665lscvx5?
>
>
>

Re: [VOTE] Release Apache Hadoop 2.6.4 RC0

2016-02-08 Thread Eric Payne

Hi Junping Du. Thank you for your work preparing this release.
I did the following things to test release Hadoop 2.6.4 rc0:- Downloaded 
hadoop-2.6.4-RC0-src.tar.gz- built from source both package, install, and 
eclipse:eclipse- Set up a 3-node, unsecured cluster with 3 queues, one of which 
has preemption enabled- Ran a successful test to ensure that preemption would 
happen to containers on the preemptable queue if they were needed for an 
application on another queue.- Ran successful streaming and yarn shell tests
Junping, I did have a concern about labelled nodes and queues. Is full label 
support backported to 2.6.4? I see that the syntax for the rmadmin command 
lists label commands like -addToClusterNodeLabels and -replaceLabelsOnNode. I 
was able to add a label (using -addToClusterNodeLabels) and I was able to 
define a queue whose accessible node label was listed with my specified label. 
However, when I tried to set the node label to a specific node using 
-replaceLabelsOnNode, the label does not show up on the specified node in 
cluster nodes UI (http://RM:8088/cluster/nodes). I also confirmed that 
submitting a job to the labelled queue gets accepted but never runs, which is 
the behavior I would expect if no node had the specified label. I will also add 
that this procedure works fine in 2.7.
Thanks,-Eric Payne

  From: Junping Du 
 To: "hdfs-...@hadoop.apache.org" ; 
"yarn-dev@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"common-...@hadoop.apache.org"  
 Sent: Wednesday, February 3, 2016 1:01 AM
 Subject: [VOTE] Release Apache Hadoop 2.6.4 RC0
   
Hi community folks,
  I've created a release candidate RC0 for Apache Hadoop 2.6.4 (the next 
maintenance release to follow up 2.6.3.) according to email thread of release 
plan 2.6.4 [1]. Below is details of this release candidate:

The RC is available for validation at:
*http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/
<http://people.apache.org/~junping_du/hadoop-2.6.4-RC0/>*

The RC tag in git is: release-2.6.4-RC0

The maven artifacts are staged via repository.apache.org at:
*https://repository.apache.org/content/repositories/orgapachehadoop-1028/?
<https://repository.apache.org/content/repositories/orgapachehadoop-1028/>*

You can find my public key at:
http://svn.apache.org/repos/asf/hadoop/common/dist/KEYS

Please try the release and vote. The vote will run for the usual 5 days.

Thanks!


Cheers,

Junping


[1]: 2.6.4 release plan: http://markmail.org/message/fk3ud3c665lscvx5?

[jira] [Resolved] (YARN-4390) Consider container request size during CS preemption

2015-12-14 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-4390.
--
Resolution: Duplicate

Closing this ticket in favor of YARN-4108

> Consider container request size during CS preemption
> 
>
> Key: YARN-4390
> URL: https://issues.apache.org/jira/browse/YARN-4390
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Affects Versions: 3.0.0, 2.8.0, 2.7.3
>    Reporter: Eric Payne
>Assignee: Eric Payne
>
> There are multiple reasons why preemption could unnecessarily preempt 
> containers. One is that an app could be requesting a large container (say 
> 8-GB), and the preemption monitor could conceivably preempt multiple 
> containers (say 8, 1-GB containers) in order to fill the large container 
> request. These smaller containers would then be rejected by the requesting AM 
> and potentially given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-4226) Make capacity scheduler queue's preemption status REST API consistent with GUI

2015-12-14 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-4226.
--
Resolution: Won't Fix

Since the code works and is only slightly confusing, I am closing this ticket 
as WontFix.

> Make capacity scheduler queue's preemption status REST API consistent with GUI
> --
>
> Key: YARN-4226
> URL: https://issues.apache.org/jira/browse/YARN-4226
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.7.1
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Minor
>
> In the capacity scheduler GUI, the preemption status has the following form:
> {code}
> Preemption:   disabled
> {code}
> However, the REST API shows the following for the same status:
> {code}
> preemptionDisabled":true
> {code}
> The latter is confusing and should be consistent with the format in the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4422) Generic AHS sometimes doesn't show started, node, or logs on App page

2015-12-04 Thread Eric Payne (JIRA)

Eric Payne created YARN-4422:


 Summary: Generic AHS sometimes doesn't show started, node, or logs 
on App page
 Key: YARN-4422
 URL: https://issues.apache.org/jira/browse/YARN-4422
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Payne
Assignee: Eric Payne


Sometimes the AM container for an app isn't able to start the JVM. This can 
happen if bogus JVM options are given to the AM container ( 
{{-Dyarn.app.mapreduce.am.command-opts=-InvalidJvmOption}}) or when 
misconfiguring the AM container's environment variables 
({{-Dyarn.app.mapreduce.am.env="JAVA_HOME=/foo/bar/baz}})

When the AM container for an app isn't able to start the JVM, the Application 
page for that application shows {{N/A}} for the {{Started}}, {{Node}}, and 
{{Logs}} columns. It _does_ have links for each app attempt, and if you click 
on one of them, you go to the Application Attempt page, where you can see all 
containers with links to their logs and nodes, including the AM container. But 
none of that shows up for the app attempts on the Application page.

Also, on the Application Attempt page, in the {{Application Attempt Overview}} 
section, the {{AM Container}} value is {{null}} and the {{Node}} value is 
{{N/A}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4390) Consider container request size during CS preemption

2015-11-24 Thread Eric Payne (JIRA)

Eric Payne created YARN-4390:


 Summary: Consider container request size during CS preemption
 Key: YARN-4390
 URL: https://issues.apache.org/jira/browse/YARN-4390
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler
Affects Versions: 3.0.0, 2.8.0, 2.7.3
Reporter: Eric Payne
Assignee: Eric Payne


There are multiple reasons why preemption could unnecessarily preempt 
containers. One is that an app could be requesting a large container (say 
8-GB), and the preemption monitor could conceivably preempt multiple containers 
(say 8, 1-GB containers) in order to fill the large container request. These 
smaller containers would then be rejected by the requesting AM and potentially 
given right back to the preempted app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4226) Make capacity scheduler queue's preemption status REST API consistent with GUI

2015-10-05 Thread Eric Payne (JIRA)

Eric Payne created YARN-4226:


 Summary: Make capacity scheduler queue's preemption status REST 
API consistent with GUI
 Key: YARN-4226
 URL: https://issues.apache.org/jira/browse/YARN-4226
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor


In the capacity scheduler GUI, the preemption status has the following form:
{code}
Preemption: disabled
{code}
However, the REST API shows the following for the same status:
{code}
preemptionDisabled":true
{code}
The latter is confusing and should be consistent with the format in the GUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4225) Add preemption status to {{yarn queue -status}}

2015-10-05 Thread Eric Payne (JIRA)

Eric Payne created YARN-4225:


 Summary: Add preemption status to {{yarn queue -status}}
 Key: YARN-4225
 URL: https://issues.apache.org/jira/browse/YARN-4225
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-4217) Failed AM attempt retries on same failed host

2015-10-02 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-4217.
--
Resolution: Duplicate

bq. Eric Payne - is this a duplicate of YARN-2005?
[~vvasudev], yes it is. I did do a search, but I missed that one. Thanks a lot!

> Failed AM attempt retries on same failed host
> -
>
> Key: YARN-4217
> URL: https://issues.apache.org/jira/browse/YARN-4217
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 2.7.1
>    Reporter: Eric Payne
>
> This happens when the cluster is maxed out. One node is going bad, so 
> everything that happens on it fails, so the bad node is never busy. Since the 
> cluster is maxed out, when the RM looks for a node with available resources, 
> it will always find the almost bad one because nothing can run on it so it 
> has available resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-4217) Failed AM attempt retries on same failed host

2015-10-01 Thread Eric Payne (JIRA)

Eric Payne created YARN-4217:


 Summary: Failed AM attempt retries on same failed host
 Key: YARN-4217
 URL: https://issues.apache.org/jira/browse/YARN-4217
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications
Affects Versions: 2.7.1
Reporter: Eric Payne


This happens when the cluster is maxed out. One node is going bad, so 
everything that happens on it fails, so the bad node is never busy. Since the 
cluster is maxed out, when the RM looks for a node with available resources, it 
will always find the almost bad one because nothing can run on it so it has 
available resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3978) Configurably turn off the saving of container info in Generic AHS

2015-07-25 Thread Eric Payne (JIRA)

Eric Payne created YARN-3978:


 Summary: Configurably turn off the saving of container info in 
Generic AHS
 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Reporter: Eric Payne
Assignee: Eric Payne


Depending on how each application's metadata is stored, one week's worth of 
data stored in the Generic Application History Server's database can grow to be 
almost a terabyte of local disk space. In order to alleviate this, I suggest 
that there is a need for a configuration option to turn off saving of non-AM 
container metadata in the GAHS data store.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart

2015-07-09 Thread Eric Payne (JIRA)

Eric Payne created YARN-3905:


 Summary: Application History Server UI NPEs when accessing apps 
run after RM restart
 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.1, 2.7.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


>From the Application History URL (http://RmHostName:8188/applicationhistory), 
>clicking on the application ID of an app that was run after the RM daemon has 
>been restarted results in a 500 error:
{noformat}
Sorry, got error 500
Please consult RFC 2616 for meanings of the error code.
{noformat}

The stack trace is as follows:
{code}
2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
applicationhistoryservice.FileSystemApplicationHistoryStore: Completed reading 
history information of all application attempts of application 
application_1436472584878_0001
2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
Failed to read the AM container of the application attempt 
appattempt_1436472584878_0001_01.
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
at 
org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
at 
org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

Eric Payne created YARN-3769:


 Summary: Preemption occurring unnecessarily because preemption 
doesn't consider user limit
 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0, 2.6.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


We are seeing the preemption monitor preempting containers from queue A and 
then seeing the capacity scheduler giving them immediately back to queue A. 
This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3540) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler

2015-04-23 Thread Eric Payne (JIRA)

Eric Payne created YARN-3540:


 Summary: Fetcher#copyMapOutput is leaking usedMemory upon 
IOException during InMemoryMapOutput shuffle handler
 Key: YARN-3540
 URL: https://issues.apache.org/jira/browse/YARN-3540
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
Priority: Blocker


We are seeing this happen when
- an NM's disk goes bad during the creation of map output(s)
- the reducer's fetcher can read the shuffle header and reserve the memory
- but gets an IOException when trying to shuffle for InMemoryMapOutput
- shuffle fetch retry is enabled




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-3275) Preemption happening on non-preemptable queues

2015-02-27 Thread Eric Payne (JIRA)

Eric Payne created YARN-3275:


 Summary: Preemption happening on non-preemptable queues
 Key: YARN-3275
 URL: https://issues.apache.org/jira/browse/YARN-3275
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-2056 introduced the ability to turn preemption on and off at the queue 
level. In cases where a queue goes over its absolute max capacity (YARN:3243, 
for example), containers can be preempted from that queue, even though the 
queue is marked as non-preemptable.

We are using this feature in large, busy clusters and seeing this behavior.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2015-02-27 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-2592.
--
Resolution: Invalid

> Preemption can kill containers to fulfil need of already over-capacity queue.
> -
>
> Key: YARN-2592
> URL: https://issues.apache.org/jira/browse/YARN-2592
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.5.1
>    Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2932) Add entry for preemption setting to queue status screen and startup/refresh logging

2014-12-08 Thread Eric Payne (JIRA)

Eric Payne created YARN-2932:


 Summary: Add entry for preemption setting to queue status screen 
and startup/refresh logging
 Key: YARN-2932
 URL: https://issues.apache.org/jira/browse/YARN-2932
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.7.0
Reporter: Eric Payne


YARN-2056 enables the ability to turn preemption on or off on a per-queue 
level. This JIRA will provide the preemption status for each queue in the 
{{HOST:8088/cluster/scheduler}} UI and in the RM log during startup/queue 
refresh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2592) Preemption can kill containers to fulfil need of already over-capacity queue.

2014-09-23 Thread Eric Payne (JIRA)

Eric Payne created YARN-2592:


 Summary: Preemption can kill containers to fulfil need of already 
over-capacity queue.
 Key: YARN-2592
 URL: https://issues.apache.org/jira/browse/YARN-2592
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.5.1, 3.0.0
Reporter: Eric Payne


There are scenarios in which one over-capacity queue can cause preemption of 
another over-capacity queue. However, since killing containers may lose work, 
it doesn't make sense to me to kill containers to feed an already over-capacity 
queue.

Consider the following:

{code}
root has A,B,C, total capacity = 90
A.guaranteed = 30, A.pending = 5, A.current = 40
B.guaranteed = 30, B.pending = 0, B.current = 50
C.guaranteed = 30, C.pending = 0, C.current = 0
{code}

In this case, the queue preemption monitor will kill 5 resources from queue B 
so that queue A can pick them up, even though queue A is already over its 
capacity. This could lose any work that those containers in B had already done.

Is there a use case for this behavior? It seems to me that if a queue is 
already over its capacity, it shouldn't destroy the work of other queues. If 
the over-capacity queue needs more resources, that seems to be a problem that 
should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2014-05-06 Thread Eric Payne (JIRA)

Eric Payne created YARN-2024:


 Summary: IOException in AppLogAggregatorImpl does not give 
stacktrace and leaves aggregated TFile in a bad state.
 Key: YARN-2024
 URL: https://issues.apache.org/jira/browse/YARN-2024
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0, 0.23.10
Reporter: Eric Payne


Multiple issues were encountered when AppLogAggregatorImpl encountered an 
IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
yarn-logs for an application that had very large (>150G each) error logs.
- An IOException was encountered during the LogWriter#append call, and a 
message was printed, but no stacktrace was provided. Message: "ERROR: Couldn't 
upload logs for container_n_nnn_nn_nn. Skipping this 
container."
- After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
LogWriter#append fail with the following stacktrace:
2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[LogAggregationService #17907,5,main] threw an Exception.
java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
at 
org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
at 
org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
...
- At this point, the yarn-logs cleaner still thinks the thread is aggregating, 
so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (YARN-1115) Provide optional means for a scheduler to check real user ACLs

2013-08-28 Thread Eric Payne (JIRA)

Eric Payne created YARN-1115:


 Summary: Provide optional means for a scheduler to check real user 
ACLs
 Key: YARN-1115
 URL: https://issues.apache.org/jira/browse/YARN-1115
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 0.23.9, 2.1.0-beta
Reporter: Eric Payne


In the framework for secure implementation using UserGroupInformation.doAs 
(http://hadoop.apache.org/docs/stable/Secure_Impersonation.html), a trusted 
superuser can submit jobs on behalf of another user in a secure way. In this 
framework, the superuser is referred to as the real user and the proxied user 
is referred to as the effective user.

Currently when a job is submitted as an effective user, the ACLs for the 
effective user are checked against the queue on which the job is to be run. 
Depending on an optional configuration, the scheduler should also check the 
ACLs of the real user if the configuration to do so is set.

For example, suppose my superuser name is super, and super is configured to 
securely proxy as joe. Also suppose there is a Hadoop queue named ops which 
only allows ACLs for super, not for joe.

When super proxies to joe in order to submit a job to the ops queue, it will 
fail because joe, as the effective user, does not have ACLs on the ops queue.

In many cases this is what you want, in order to protect queues that joe should 
not be using.

However, there are times when super may need to proxy to many users, and the 
client running as super just wants to use the ops queue because the ops queue 
is already dedicated to the client's purpose, and, to keep the ops queue 
dedicated to that purpose, super doesn't want to open up ACLs to joe in general 
on the ops queue. Without this functionality, in this case, the client running 
as super needs to figure out which queue each user has ACLs opened up for, and 
then coordinate with other tasks using those queues.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

90 matches

Mail list logo