[jira] [Commented] (YARN-1487) How to develop with Eclipse

2013-12-10 Thread Yang Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844068#comment-13844068
 ] 

Yang Hao commented on YARN-1487:


When I compile the plugin, there are some errors, as follows:

[ivy:resolve]   ::
[ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
[ivy:resolve]   ::
[ivy:resolve]   :: 
org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.2.0: not found
[ivy:resolve]   :: 
org.apache.hadoop#hadoop-mapreduce-client-core;2.2.0: not found
[ivy:resolve]   :: 
org.apache.hadoop#hadoop-mapreduce-client-common;2.2.0: not found
[ivy:resolve]   :: org.apache.hadoop#hadoop-hdfs;2.2.0: not found
[ivy:resolve]   :: org.apache.hadoop#hadoop-common;2.2.0: not found
[ivy:resolve]   ::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS


> How to develop with Eclipse
> ---
>
> Key: YARN-1487
> URL: https://issues.apache.org/jira/browse/YARN-1487
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications
>Affects Versions: 2.2.0
> Environment: Linux,Hadoop2
>Reporter: Yang Hao
>  Labels: eclipse, plugin, yarn
> Fix For: 2.2.0
>
>
> We can develop an application on Eclipse, but the Eclipse plugin is not 
> provided on Hadoop2. Will the new version provide Eclipse plugin for 
> developers?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-12-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844110#comment-13844110
 ] 

Karthik Kambatla commented on YARN-1029:


Manually testing the posted patch on a cluster showed that automatic failover 
works. However, automatic failover fails to take over after an explicit manual 
failover. To address this RMActiveStandbyElector should implement ZKFCProtocol 
and RMHAServiceTarget#getZKFCProxy should return a proxy to this. Will address 
this and other minor details in the next patch.

> Allow embedding leader election into the RM
> ---
>
> Key: YARN-1029
> URL: https://issues.apache.org/jira/browse/YARN-1029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such 
> that ZooKeeper based leader election and notification is in-built. In 
> conjunction with a ZK state store, this configuration will be a simple 
> deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844117#comment-13844117
 ] 

Bikas Saha commented on YARN-1029:
--

What are the pros and cons of using ZKFC embedded vs ActiveStandbyElector? If 
ActiveStandbyElector has to implement ZKFC protocol then are we better off just 
using ZKFC embedded directly?

> Allow embedding leader election into the RM
> ---
>
> Key: YARN-1029
> URL: https://issues.apache.org/jira/browse/YARN-1029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such 
> that ZooKeeper based leader election and notification is in-built. In 
> conjunction with a ZK state store, this configuration will be a simple 
> deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1481) ResourceManager and AdminService interact in a convoluted manner after YARN-1318

2013-12-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844127#comment-13844127
 ] 

Karthik Kambatla commented on YARN-1481:


Thanks [~vinodkv]. The patch looks good to me. One minor nit: 
{{AdminService#isRMActive()}} need not be synchronized. I am okay with 
addressing the nit in another HA JIRA - may be, YARN-1029.

+1, otherwise. Will wait for any comments until end of the day and commit it.

> ResourceManager and AdminService interact in a convoluted manner after 
> YARN-1318
> 
>
> Key: YARN-1481
> URL: https://issues.apache.org/jira/browse/YARN-1481
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
> Attachments: YARN-1481-20131207.txt, YARN-1481-20131209.txt
>
>
> This is something I found while reviewing YARN-1318, but didn't halt that 
> patch as many cycles went there already. Some top level issues
>  - Not easy to follow RM's service life cycle
> -- RM adds only AdminService as its service directly.
> -- Other services are added to RM when AdminService's init calls 
> RM.activeServices.init()
>  - Overall, AdminService shouldn't encompass all of RM's HA state management. 
> It was originally supposed to be the implementation of just the RPC server.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

2013-12-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844182#comment-13844182
 ] 

Karthik Kambatla commented on YARN-1029:


Correction: Actually, it would be the AdminService that will have to implement 
ZKFCProtocol, not ActiveStandbyElector.

bq. What are the pros and cons of using ZKFC embedded vs ActiveStandbyElector?
Indeed, my first implementation was embedding ZKFC. While it works fine, I 
found it round about and has some avoidable overhead. Embedding 
ActiveStandbyElector definitely seems like a simpler, cleaner approach.

Cons of ZKFC / Pros of ActiveStandbyElector:
# ZKFC communicates to the RM through RPC; when embedded, both are in the same 
process.
# In addition to ActiveStandbyElector, ZKFC has other overheads - health 
monitoring, fencing etc. which might not be required in a simple embedded 
option.
# ZKFC#formatZK() needs to be exposed through rmadmin, which complicates it 
further.
# Embedding ZKFC isn't very clean.

Cons of ActiveStandbyElector: AFAIK, the only drawback of ActiveStandbyElector 
is having AdminService implement ZKFCProtocol - two methods: cedeActive() and 
gracefulFailover(). These methods are simple and straight-forward and are 
needed only to be able to safely failover manually when automatic-failover is 
enabled. 

> Allow embedding leader election into the RM
> ---
>
> Key: YARN-1029
> URL: https://issues.apache.org/jira/browse/YARN-1029
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such 
> that ZooKeeper based leader election and notification is in-built. In 
> conjunction with a ZK state store, this configuration will be a simple 
> deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1448) AM-RM protocol changes to support container resizing

2013-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844189#comment-13844189
 ] 

Hudson commented on YARN-1448:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #417 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/417/])
YARN-1448. AM-RM protocol changes to support container resizing (Wangda Tan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1549627)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateResponse.java


> AM-RM protocol changes to support container resizing
> 
>
> Key: YARN-1448
> URL: https://issues.apache.org/jira/browse/YARN-1448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.2.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.4.0
>
> Attachments: yarn-1448.1.patch, yarn-1448.2.patch, yarn-1448.3.patch
>
>
> As described in YARN-1197, we need add API in RM to support
> 1) Add increase request in AllocateRequest
> 2) Can get successfully increased/decreased containers from RM in 
> AllocateResponse



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1053) Diagnostic message from ContainerExitEvent is ignored in ContainerImpl

2013-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844209#comment-13844209
 ] 

Hudson commented on YARN-1053:
--

FAILURE: Integrated in Hadoop-Hdfs-0.23-Build #816 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/816/])
svn merge -c 1543973 FIXES: YARN-1053. Diagnostic message from 
ContainerExitEvent is ignored in ContainerImpl. Contributed by Omkar Vinit 
Joshi (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1549691)
* /hadoop/common/branches/branch-0.23/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/TestContainer.java


> Diagnostic message from ContainerExitEvent is ignored in ContainerImpl
> --
>
> Key: YARN-1053
> URL: https://issues.apache.org/jira/browse/YARN-1053
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Omkar Vinit Joshi
>Assignee: Omkar Vinit Joshi
>Priority: Blocker
>  Labels: newbie
> Fix For: 2.4.0, 0.23.11
>
> Attachments: YARN-1053.1.patch, YARN-1053.20130809.patch
>
>
> If the container launch fails then we send ContainerExitEvent. This event 
> contains exitCode and diagnostic message. Today we are ignoring diagnostic 
> message while handling this event inside ContainerImpl. Fixing it as it is 
> useful in diagnosing the failure.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1448) AM-RM protocol changes to support container resizing

2013-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844271#comment-13844271
 ] 

Hudson commented on YARN-1448:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1608 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1608/])
YARN-1448. AM-RM protocol changes to support container resizing (Wangda Tan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1549627)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateResponse.java


> AM-RM protocol changes to support container resizing
> 
>
> Key: YARN-1448
> URL: https://issues.apache.org/jira/browse/YARN-1448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.2.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.4.0
>
> Attachments: yarn-1448.1.patch, yarn-1448.2.patch, yarn-1448.3.patch
>
>
> As described in YARN-1197, we need add API in RM to support
> 1) Add increase request in AllocateRequest
> 2) Can get successfully increased/decreased containers from RM in 
> AllocateResponse



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1448) AM-RM protocol changes to support container resizing

2013-12-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844306#comment-13844306
 ] 

Hudson commented on YARN-1448:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1634 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1634/])
YARN-1448. AM-RM protocol changes to support container resizing (Wangda Tan via 
Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1549627)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/AllocateResponse.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_service_protos.proto
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateRequestPBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/impl/pb/AllocateResponsePBImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateRequest.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestAllocateResponse.java


> AM-RM protocol changes to support container resizing
> 
>
> Key: YARN-1448
> URL: https://issues.apache.org/jira/browse/YARN-1448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, resourcemanager
>Affects Versions: 2.2.0
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.4.0
>
> Attachments: yarn-1448.1.patch, yarn-1448.2.patch, yarn-1448.3.patch
>
>
> As described in YARN-1197, we need add API in RM to support
> 1) Add increase request in AllocateRequest
> 2) Can get successfully increased/decreased containers from RM in 
> AllocateResponse



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844344#comment-13844344
 ] 

Arun C Murthy commented on YARN-1404:
-

I've spent time thinking about this in the context of running a myriad of 
external systems in YARN such as Impala, HDFS Caching (HDFS-4949) and some 
others.

The overarching goal is to allow YARN to act as a ResourceManager for the 
overall cluster *and* a Workload Manager for external systems i.e. this way 
Impala or HDFS can rely on YARN's queues for workload management, SLAs via 
preemption etc.

Is that a good characterization of the problem at hand?

I think it's a good goal to support - this will allow other external systems to 
leverage YARN's capabilities for both resource sharing and workload management.

Now, if we all agree on this - we can figure the best way to support this in a 
first-class manner.



Ok, the core requirement is for an external system (Impala, HDFS, others) to 
leverage YARN's workload management capabilities (queues etc.) to acquire 
resources (cpu, memory) *on behalf* of a particular entity (user, queue) for 
completing a user's request (run a query, cache a dataset in RAM). 

The *key* is that these external systems need to acquire resources on behalf of 
the user and ensure that the chargeback is applied to the correct user, queue 
etc.

This is a *brand new requirement* for YARN... so far, we have assumed that the 
entity acquiring the resource would also be actually utilizing the resource by 
launching a container etc. 

Here, it's clear that the requirement is that entity acquiring the resource 
would like to *delegate* the resource to an external framework. For e.g.
# A user query would like to acquire cpu, memory etc. for appropriate 
accounting chargeback and then delegate it to Impala.
# A user request for caching data would like to acquire memory for appropriate 
accounting chargeback and then delegate to the Datanode.



In this scenario, I think explicitly allowing for *delegation* of a container 
would solve the problem in a first-class manner.

We should add a new API to the NodeManager which would allow an application to 
*delegate* a container's resources to a different container:

{code:title=ContainerManagementProtocol.java|borderStyle=solid}  
public interface ContainerManagementProtocol {
  // ...
  public DelegateContainerResponse delegateContainer(DelegateContainerRequest 
request);
  // ...
}
{code}

{code:title=DelegateContainerRequest.java|borderStyle=solid}  
public abstract class DelegateContainerRequest {
  // ...
  public ContainerLaunchContext getSourceContainer();

  public ContainerId getTargetContainer();
  // ...
}
{code}


The implementation of this api would notify the NodeManager to change it's 
monitoring on the recipient container i.e. Impala or Datanode by modifying 
cgroup of the recipient container.

Similarly, the NodeManager could be instructed by the ResourceManager to 
preempt the resources of the source container for continuing to serve the 
global SLAs of the queues - again, this is implemented by modifying the cgroup 
of the recipient container. This will allow for ResouceManager/NodeManager to 
be explicitly in control of resources, even in the face of misbehaving AMs etc.



The result of the above proposal is very similar to what is already being 
discussed, the only difference being that this is explicit (NodeManager knows 
the source and recipient containers) and this allows for all existing features 
such as preemption, over-allocation of resources to YARN queues etc. to 
continue to work as today.



Thoughts?

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an exam

[jira] [Created] (YARN-1488) Allow containers to delegate resources to another container

2013-12-10 Thread Arun C Murthy (JIRA)
Arun C Murthy created YARN-1488:
---

 Summary: Allow containers to delegate resources to another 
container
 Key: YARN-1488
 URL: https://issues.apache.org/jira/browse/YARN-1488
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy


We should allow containers to delegate resources to another container. This 
would allow external frameworks to share not just YARN's resource-management 
capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1488) Allow containers to delegate resources to another container

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844348#comment-13844348
 ] 

Arun C Murthy commented on YARN-1488:
-

We should add a new API to the NodeManager which would allow an application to 
*delegate* a container's resources to a different container:

{code:title=ContainerManagementProtocol.java|borderStyle=solid}  
public interface ContainerManagementProtocol {
 // ...
 public DelegateContainerResponse delegateContainer(DelegateContainerRequest 
request);
 // ...
}
{code}

{code:title=DelegateContainerRequest.java|borderStyle=solid}  
public abstract class DelegateContainerRequest {
 // ...
 public ContainerLaunchContext getSourceContainer();

 public ContainerId getTargetContainer();
 // ...
}
{code}


The implementation of this api would notify the NodeManager to change it's 
monitoring on the recipient container i.e. Impala or Datanode by modifying 
cgroup of the recipient container.

Similarly, the NodeManager could be instructed by the ResourceManager to 
preempt the resources of the source container for continuing to serve the 
global SLAs of the queues - again, this is implemented by modifying the cgroup 
of the recipient container. This will allow for ResouceManager/NodeManager to 
be explicitly in control of resources, even in the face of misbehaving AMs etc.

> Allow containers to delegate resources to another container
> ---
>
> Key: YARN-1488
> URL: https://issues.apache.org/jira/browse/YARN-1488
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Arun C Murthy
>
> We should allow containers to delegate resources to another container. This 
> would allow external frameworks to share not just YARN's resource-management 
> capabilities but also it's workload-management capabilities.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844352#comment-13844352
 ] 

Arun C Murthy commented on YARN-1404:
-

I've opened YARN-1488 to track delegation of container resources.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844354#comment-13844354
 ] 

Arun C Murthy commented on YARN-1197:
-

Sorry, to come in late - I'm +1 for the overall idea/approach.

However, I feel we still have to work through details on the scheduler side. 
So, I'd like to see this developed in a branch. This would allow for a full 
picture to emerge before we commit it to a specific release 2.4 v/s 2.5 etc. 
Thoughts?

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
> yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844370#comment-13844370
 ] 

Bikas Saha commented on YARN-1197:
--

There are some plumbing/infra related changes which we could commit to trunk 
safely. None of that would be executed until some scheduler actually supports 
this. When that happens we could decide to move the code to branch-2 to target 
a release. Would prefer that to a branch which would need maintenance.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
> yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844381#comment-13844381
 ] 

Arun C Murthy commented on YARN-1197:
-

The problem is that we can't ship half of this feature in 2.4 - it's either in 
or out. So, a branch would be significantly better - it's either in or out for 
2.4.

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
> yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844398#comment-13844398
 ] 

Bikas Saha commented on YARN-1404:
--

Is the scenario having containers from multiple users asking for resources 
within their quota and then delegating them to a shared service to use on their 
behalf. The above would imply that datanode/impala/others would be running as 
yarn containers so that they can be targets for delegation. 

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844437#comment-13844437
 ] 

Arun C Murthy commented on YARN-1404:
-

Yes, agreed. Sorry, I thought it was clear that was what I proposing with:

{quote}
The implementation of this api would notify the NodeManager to change it's 
monitoring on the recipient container i.e. Impala or Datanode by modifying 
cgroup of the recipient container.
Similarly, the NodeManager could be instructed by the ResourceManager to 
preempt the resources of the source container for continuing to serve the 
global SLAs of the queues - again, this is implemented by modifying the cgroup 
of the recipient container. This will allow for ResouceManager/NodeManager to 
be explicitly in control of resources, even in the face of misbehaving AMs etc.
{quote}

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844453#comment-13844453
 ] 

Bikas Saha commented on YARN-1121:
--

If the thread gets interrupted or otherwise has an unexpected exit then it does 
not look like drained will be set to true. And service stop will hang.
{code}
 while (!stopped && !Thread.currentThread().isInterrupted()) {
+  drained = eventQueue.isEmpty();
{code}

Also, it would probably be better if we signaled an object when we exit the 
above run() method and block on that signal instead of the following spin wait.
{code}
+  while(!drained) {
+Thread.yield();
+  }
{code}

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.4.0
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well

2013-12-10 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal reassigned YARN-1413:
---

Assignee: Mayank Bansal  (was: Zhijie Shen)

> [YARN-321] AHS WebUI should server aggregated logs as well
> --
>
> Key: YARN-1413
> URL: https://issues.apache.org/jira/browse/YARN-1413
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well

2013-12-10 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-1413:


Attachment: YARN-1413-1.patch

Attaching the patch.

Thanks,
Mayank


> [YARN-321] AHS WebUI should server aggregated logs as well
> --
>
> Key: YARN-1413
> URL: https://issues.apache.org/jira/browse/YARN-1413
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1413-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1413) [YARN-321] AHS WebUI should server aggregated logs as well

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844567#comment-13844567
 ] 

Hadoop QA commented on YARN-1413:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618085/YARN-1413-1.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2640//console

This message is automatically generated.

> [YARN-321] AHS WebUI should server aggregated logs as well
> --
>
> Key: YARN-1413
> URL: https://issues.apache.org/jira/browse/YARN-1413
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhijie Shen
>Assignee: Mayank Bansal
> Attachments: YARN-1413-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2013-12-10 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844594#comment-13844594
 ] 

Hitesh Shah commented on YARN-1040:
---

Given the recent comments on YARN-1404, I believe that this should not be 
supported unless the resources are being delegated to another YARN container. 

Furthermore, if we are talking about container leases ( for multiple process 
launches and not doing any resource delegation ), a container lease should 
start when the first process is launched - thereby having an API that supports 
a null ContainerLaunchContext is moot. The lease aspects should probably be 
encoded into the container token so that the NM understands that a process 
exiting in a particular container need not signal the end of the container i.e. 
multipleProcesses should not be an explicit flag in the api.  

> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1040) De-link container life cycle from the process and add ability to execute multiple processes in the same long-lived container

2013-12-10 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844595#comment-13844595
 ] 

Hitesh Shah commented on YARN-1040:
---

Sorry - got my wires crossed on the different jiras going around. To clarify, I 
believe container leases for multiple processes is a good feature to have. 
Allowing a container to be launched without a process should be a no-no. 
Resource delegation as mentioned in YARN-1404 seems to be a decent approach at 
assigning resources to other containers - however, it should only be restricted 
to assigning resources to containers under the control of YARN.



> De-link container life cycle from the process and add ability to execute 
> multiple processes in the same long-lived container
> 
>
> Key: YARN-1040
> URL: https://issues.apache.org/jira/browse/YARN-1040
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1412) Allocating Containers on a particular Node in Yarn

2013-12-10 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated YARN-1412:
---

Affects Version/s: 2.2.0

> Allocating Containers on a particular Node in Yarn
> --
>
> Key: YARN-1412
> URL: https://issues.apache.org/jira/browse/YARN-1412
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: centos, Hadoop 2.2.0
>Reporter: gaurav gupta
>
> Summary of the problem: 
>  If I pass the node on which I want container and set relax locality default 
> which is true, I don't get back the container on the node specified even if 
> the resources are available on the node. It doesn't matter if I set rack or 
> not.
> Here is the snippet of the code that I am using
> AMRMClient amRmClient =  AMRMClient.createAMRMClient();;
> String host = "h1";
> Resource capability = Records.newRecord(Resource.class);
> capability.setMemory(memory);
> nodes = new String[] {host};
> // in order to request a host, we also have to request the rack
> racks = new String[] {"/default-rack"};
>  List containerRequests = new 
> ArrayList();
> List releasedContainers = new ArrayList();
> containerRequests.add(new ContainerRequest(capability, nodes, racks, 
> Priority.newInstance(priority)));
> if (containerRequests.size() > 0) {
>   LOG.info("Asking RM for containers: " + containerRequests);
>   for (ContainerRequest cr : containerRequests) {
> LOG.info("Requested container: {}", cr.toString());
> amRmClient.addContainerRequest(cr);
>   }
> }
> for (ContainerId containerId : releasedContainers) {
>   LOG.info("Released container, id={}", containerId.getId());
>   amRmClient.releaseAssignedContainer(containerId);
> }
> return amRmClient.allocate(0);



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844740#comment-13844740
 ] 

Sandy Ryza commented on YARN-1404:
--

Arun, I think I agree with most of the above and your proposal makes a lot of 
sense to me.

There are numerous issues to tackle.  On the YARN side:
* YARN has assumed since its inception that a container's resources belong to a 
single application - we are likely to come across many subtle issues when 
rethinking this assumption.
* While YARN has promise as a platform for deploying long-running services, 
that functionality currently isn't stable in the way that much of the rest of 
YARN is.
* Currently preemption means killing a container process - we would need to 
change the way this mechanism works.

On the Datanode/Impala side:
* Rethink the way we deploy these services to allow them to run inside YARN 
containers.

Stepping back a little, YARN does three things:
* Central Scheduling - decides who gets to run and when and where they get to 
do so
* Deployment - ships bits across the cluster and runs container processes
* Enforcement - monitors container processes to make sure they stay within 
scheduled limits

The central scheduling part is the most valuable to a framework like Impala 
because it allows it to truly share resources on a cluster with other 
processing frameworks.  The second two are helpful - they allow us to 
standardize the way work is deployed on a Hadoop cluster - but they aren't 
enabling things that's fundamentally impossible without them.  While these will 
simplify things in the long term and create a more cohesive platform, Impala 
currently has little tangible to gain by doing deployment and enforcement 
inside YARN.

So, to summarize, I like the idea and would be both happy to see YARN move in 
this direction and to help it do so. However, making Impala-YARN integration 
depend on this fairly involved work would unnecessarily set it back.  In the 
short term, we have proposed a minimally invasive change (making it possible to 
launch containers without starting processes) that would allow YARN to satisfy 
our use case. I am confident that the change poses no risk from a security 
perspective, from a stability perspective, or in terms of detracting from the 
longer-term vision.


> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container)

[jira] [Commented] (YARN-1028) Add FailoverProxyProvider like capability to RMProxy

2013-12-10 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844747#comment-13844747
 ] 

Xuan Gong commented on YARN-1028:
-

small nit :
Add 
{code}
getRMAdminService(0).transitionToActive(req);
getRMAdminService(1).transitionToStandBy(req);
{code}
To
{code}
+  @Test
+  public void testExplicitFailover()
+  throws YarnException, InterruptedException, IOException {
+verifyNodeManagerConnected();
+verifyClientConnection();
+
+// Failover to the second RM
+getRMAdminService(0).transitionToStandby(req);
+getRMAdminService(1).transitionToActive(req);
+
+verifyNodeManagerConnected();
+verifyClientConnection();
+
+// Failover back to the first RM
+verifyNodeManagerConnected();
+verifyClientConnection();
+  }
{code}
to failover back to first RM.

Others are LGTM

> Add FailoverProxyProvider like capability to RMProxy
> 
>
> Key: YARN-1028
> URL: https://issues.apache.org/jira/browse/YARN-1028
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Bikas Saha
>Assignee: Karthik Kambatla
> Attachments: yarn-1028-1.patch, yarn-1028-2.patch, yarn-1028-3.patch, 
> yarn-1028-4.patch, yarn-1028-5.patch, yarn-1028-draft-cumulative.patch
>
>
> RMProxy layer currently abstracts RM discovery and implements it by looking 
> up service information from configuration. Motivated by HDFS and using 
> existing classes from Common, we can add failover proxy providers that may 
> provide RM discovery in extensible ways.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1412) Allocating Containers on a particular Node in Yarn

2013-12-10 Thread Thomas Weise (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844750#comment-13844750
 ] 

Thomas Weise commented on YARN-1412:


We implemented it in the AM, tracking resource requests made for a specific 
host with relaxLocality=false and then, if they are not filled by the scheduler 
after n heartbeats, dropping host constraint and switching to 
relaxLocality=true. We would prefer to leave this to YARN with the combination 
of specific host and relaxLocality=true, but it does not work.

The requirement is not unique to our application, and instead of handling it in 
user land it would be great to see this working as expected in future YARN 
versions.


> Allocating Containers on a particular Node in Yarn
> --
>
> Key: YARN-1412
> URL: https://issues.apache.org/jira/browse/YARN-1412
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
> Environment: centos, Hadoop 2.2.0
>Reporter: gaurav gupta
>
> Summary of the problem: 
>  If I pass the node on which I want container and set relax locality default 
> which is true, I don't get back the container on the node specified even if 
> the resources are available on the node. It doesn't matter if I set rack or 
> not.
> Here is the snippet of the code that I am using
> AMRMClient amRmClient =  AMRMClient.createAMRMClient();;
> String host = "h1";
> Resource capability = Records.newRecord(Resource.class);
> capability.setMemory(memory);
> nodes = new String[] {host};
> // in order to request a host, we also have to request the rack
> racks = new String[] {"/default-rack"};
>  List containerRequests = new 
> ArrayList();
> List releasedContainers = new ArrayList();
> containerRequests.add(new ContainerRequest(capability, nodes, racks, 
> Priority.newInstance(priority)));
> if (containerRequests.size() > 0) {
>   LOG.info("Asking RM for containers: " + containerRequests);
>   for (ContainerRequest cr : containerRequests) {
> LOG.info("Requested container: {}", cr.toString());
> amRmClient.addContainerRequest(cr);
>   }
> }
> for (ContainerId containerId : releasedContainers) {
>   LOG.info("Released container, id={}", containerId.getId());
>   amRmClient.releaseAssignedContainer(containerId);
> }
> return amRmClient.allocate(0);



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844772#comment-13844772
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

Re Tucu's reply

bq. Regarding ACLs and an on/off switch: IMO they are not necessary for the 
following reason. You need an external system installed and running in the node 
to use the resources of an unmanaged container. If you have direct access into 
the node to start the external system, you are 'trusted'. If you don't have 
direct access you cannot use the resources of an unmanaged container.
Unfortunately that is not enough. We are exposing an API on NodeManager that 
anybody can use. The ACL prevents that.

bq. In the case of managed containers we don't have a liveliness 'report' and 
the container process could very well be hung. In such scenario is the 
responsibility of the AM to detected the liveliness of the container process 
and react if it is considered hung.
Like I said, we do have an implicit liveliness report - process liveliness. And 
NodeManager depends on that today to inform the app of container-finishes.

bq. Regarding NM assume a whole lot of things about containers 3 bullet items: 
For the my current use case none if this is needed. It could be relatively easy 
to enable such functionality if a use case that needs it arises.
So, then we start off with the assumption that they are not needed? That 
creates two very different code paths for managed and unmanded containers. If 
possible we should avoid that.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory i

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844774#comment-13844774
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

bq. In this scenario, I think explicitly allowing for delegation of a container 
would solve the problem in a first-class manner.
This is an interesting solution that avoids the problems about trust, 
liveliness reporting and resource limitations' enforcement. +1 for considering 
something like this.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> process, complete absence of memory or CPU is not possible even if the dummy 
> process is 'sleep', a minimal amount of memory and CPU is required for the 
> dummy process.
> Because of this it is desirable to be able to have a container without a 
> backing process.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844781#comment-13844781
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

{quote}
Stepping back a little, YARN does three things:
Central Scheduling - decides who gets to run and when and where they get to do 
so
Deployment - ships bits across the cluster and runs container processes
Enforcement - monitors container processes to make sure they stay within 
scheduled limits
The central scheduling part is the most valuable to a framework like Impala 
because it allows it to truly share resources on a cluster with other 
processing frameworks. The second two are helpful - they allow us to 
standardize the way work is deployed on a Hadoop cluster - but they aren't 
enabling things that's fundamentally impossible without them. While these will 
simplify things in the long term and create a more cohesive platform, Impala 
currently has little tangible to gain by doing deployment and enforcement 
inside YARN.
{quote}

Don't agree with that characterization. The thing is to enable only central 
scheduling, YARN has to give up its control over liveliness & enforcement and 
needs to create a new level of trust. If there are alternative architectures 
that will avoid losing that control, YARN will chose those options. The 
question is whether external systems want to take that option or not.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resources than the ones that have been allocated. Memory is book kept per 
> 'query fragment' and the threads used for the processing of the 'query 
> fragment' are placed under a cgroup to contain CPU utilization.
> Today, for all resources that have been asked to Yarn RM, a (container) 
> process must be started via the corresponding NodeManager. Failing to do 
> this, will result on the cancelation of the container allocation 
> relinquishing the acquired resource capacity back to the pool of available 
> resources. To avoid this, Impala starts a dummy container process doing 
> 'sleep 10y'.
> Using a dummy container process has its drawbacks:
> * the dummy container process is in a cgroup with a given number of CPU 
> shares that are not used and Impala is re-issuing those CPU shares to another 
> cgroup for the thread running the 'query fragment'. The cgroup CPU 
> enforcement works correctly because of the CPU controller implementation (but 
> the formal specified behavior is actually undefined).
> * Impala may ask for CPU and memory independent of each other. Some requests 
> may be only memory with no CPU or viceversa. Because a container requires a 
> pro

[jira] [Created] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1489:
-

 Summary: [Umbrella] Work-preserving ApplicationMaster restart
 Key: YARN-1489
 URL: https://issues.apache.org/jira/browse/YARN-1489
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Today if AMs go down,
 - RM kills all the containers of that ApplicationAttempt
 - New ApplicationAttempt doesn't know where the previous containers are running
 - Old running containers don't know where the new AM is running.

We need to fix this to enable work-preserving AM restart. The later two 
potentially can be done at the app level, but it is good to have a common 
solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-1490:
-

 Summary: RM should optionally not kill all containers when an 
ApplicationMaster exits
 Key: YARN-1490
 URL: https://issues.apache.org/jira/browse/YARN-1490
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


This is needed to enable work-preserving AM restart. Some apps can chose to 
reconnect with old running containers, some may not want to. This should be an 
option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1041) RM to bind and notify a restarted AM of existing containers

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1041:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1489

> RM to bind and notify a restarted AM of existing containers
> ---
>
> Key: YARN-1041
> URL: https://issues.apache.org/jira/browse/YARN-1041
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Jian He
>
> For long lived containers we don't want the AM to be a SPOF.
> When the RM restarts a (failed) AM, it should be given the list of containers 
> it had already been allocated. the AM should then be able to contact the NMs 
> to get details on them. NMs would also need to do any binding of the 
> containers needed to handle a moved/restarted AM.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1041) RM to bind and notify a restarted AM of existing containers

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1041:
--

Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-896)

> RM to bind and notify a restarted AM of existing containers
> ---
>
> Key: YARN-1041
> URL: https://issues.apache.org/jira/browse/YARN-1041
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Assignee: Jian He
>
> For long lived containers we don't want the AM to be a SPOF.
> When the RM restarts a (failed) AM, it should be given the list of containers 
> it had already been allocated. the AM should then be able to contact the NMs 
> to get details on them. NMs would also need to do any binding of the 
> containers needed to handle a moved/restarted AM.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1136) Replace junit.framework.Assert with org.junit.Assert

2013-12-10 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-1136:
--

Assignee: Chen He

> Replace junit.framework.Assert with org.junit.Assert
> 
>
> Key: YARN-1136
> URL: https://issues.apache.org/jira/browse/YARN-1136
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.1.0-beta
>Reporter: Karthik Kambatla
>Assignee: Chen He
>  Labels: newbie, test
>
> There are several places where we are using junit.framework.Assert instead of 
> org.junit.Assert.
> {code}grep -rn "junit.framework.Assert" hadoop-yarn-project/ 
> --include=*.java{code} 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1491) Upgrade JUnit3 TestCase to JUnit 4

2013-12-10 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created YARN-1491:
-

 Summary: Upgrade JUnit3 TestCase to JUnit 4
 Key: YARN-1491
 URL: https://issues.apache.org/jira/browse/YARN-1491
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Jonathan Eagles
Assignee: Chen He


There are still four references to test classes that extend from 
junit.framework.TestCase

hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestYarnVersionInfo.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestLinuxResourceCalculatorPlugin.java
hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/TestWindowsBasedProcessTree.java




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-408) Capacity Scheduler delay scheduling should not be disabled by default

2013-12-10 Thread Mayank Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Bansal updated YARN-408:
---

Attachment: YARN-408-trunk-3.patch

Fixing test.

Thanks,
Mayank

> Capacity Scheduler delay scheduling should not be disabled by default
> -
>
> Key: YARN-408
> URL: https://issues.apache.org/jira/browse/YARN-408
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN-408-trunk-2.patch, YARN-408-trunk-3.patch, 
> YARN-408-trunk.patch
>
>
> Capacity Scheduler delay scheduling should not be disabled by default.
> Enabling it to number of nodes in one rack.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1391) Lost node list contains many active node with different port

2013-12-10 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Attachment: YARN-1391.v1.patch

> Lost node list contains many active node with different port
> 
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch
>
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId

2013-12-10 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Description: in case of multiple node managers on a single machine. each of 
them should be identified by NodeId, which is more unique than just host name  
(was: When restarting node manager, the active node list in webUI will contain 
duplicate entries. Such two entries have the same host name with different port 
number. After expiry interval, the older entry will get expired and 
transitioned to lost node list, and stay there until this node gets restarted 
again.)

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch
>
>
> in case of multiple node managers on a single machine. each of them should be 
> identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1391) Lost node list should be identify by NodeId

2013-12-10 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-1391:
--

Summary: Lost node list should be identify by NodeId  (was: Lost node list 
contains many active node with different port)

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch
>
>
> When restarting node manager, the active node list in webUI will contain 
> duplicate entries. Such two entries have the same host name with different 
> port number. After expiry interval, the older entry will get expired and 
> transitioned to lost node list, and stay there until this node gets restarted 
> again.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-408) Capacity Scheduler delay scheduling should not be disabled by default

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844875#comment-13844875
 ] 

Hadoop QA commented on YARN-408:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12618141/YARN-408-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2641//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2641//console

This message is automatically generated.

> Capacity Scheduler delay scheduling should not be disabled by default
> -
>
> Key: YARN-408
> URL: https://issues.apache.org/jira/browse/YARN-408
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: Mayank Bansal
>Assignee: Mayank Bansal
>Priority: Minor
> Attachments: YARN-408-trunk-2.patch, YARN-408-trunk-3.patch, 
> YARN-408-trunk.patch
>
>
> Capacity Scheduler delay scheduling should not be disabled by default.
> Enabling it to number of nodes in one rack.
> Thanks,
> Mayank



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1391) Lost node list should be identify by NodeId

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844896#comment-13844896
 ] 

Hadoop QA commented on YARN-1391:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618147/YARN-1391.v1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2642//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2642//console

This message is automatically generated.

> Lost node list should be identify by NodeId
> ---
>
> Key: YARN-1391
> URL: https://issues.apache.org/jira/browse/YARN-1391
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.0.5-alpha
>Reporter: Siqi Li
>Assignee: Siqi Li
> Attachments: YARN-1391.v1.patch
>
>
> in case of multiple node managers on a single machine. each of them should be 
> identified by NodeId, which is more unique than just host name



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1197) Support changing resources of an allocated container

2013-12-10 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844926#comment-13844926
 ] 

Wangda Tan commented on YARN-1197:
--

Agree, I also think the scheduler part need some time to review, I'll create a 
Jira for scheduler part and upload patch(updated against YARN-1447 and 
YARN-1448)/design doc ASAP. 

> Support changing resources of an allocated container
> 
>
> Key: YARN-1197
> URL: https://issues.apache.org/jira/browse/YARN-1197
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: mapreduce-project.patch.ver.1, 
> tools-project.patch.ver.1, yarn-1197-v2.pdf, yarn-1197-v3.pdf, 
> yarn-1197-v4.pdf, yarn-1197-v5.pdf, yarn-1197.pdf, 
> yarn-api-protocol.patch.ver.1, yarn-pb-impl.patch.ver.1, 
> yarn-server-common.patch.ver.1, yarn-server-nodemanager.patch.ver.1, 
> yarn-server-resourcemanager.patch.ver.1
>
>
> The current YARN resource management logic assumes resource allocated to a 
> container is fixed during the lifetime of it. When users want to change a 
> resource 
> of an allocated container the only way is releasing it and allocating a new 
> container with expected size.
> Allowing run-time changing resources of an allocated container will give us 
> better control of resource usage in application side



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Moved] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli moved HADOOP-9639 to YARN-1492:
---

  Component/s: (was: filecache)
Affects Version/s: (was: 2.0.4-alpha)
   2.0.4-alpha
  Key: YARN-1492  (was: HADOOP-9639)
  Project: Hadoop YARN  (was: Hadoop Common)

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1492) truly shared cache for jars (jobjar/libjar)

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844930#comment-13844930
 ] 

Vinod Kumar Vavilapalli commented on YARN-1492:
---

Technical issue. This should be a YARN JIRA. As YARN handles distributed cache, 
it makes sense to have this discussion here. I don't follow the common lists 
much and I almost missed this (it's possible others too missed it because of 
that). If/when we create a branch, let's create it with a YARN JIRA number.

I just moved the JIRA to YARN. Let me know if you disagree.

> truly shared cache for jars (jobjar/libjar)
> ---
>
> Key: YARN-1492
> URL: https://issues.apache.org/jira/browse/YARN-1492
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.0.4-alpha
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Attachments: shared_cache_design.pdf, shared_cache_design_v2.pdf, 
> shared_cache_design_v3.pdf, shared_cache_design_v4.pdf
>
>
> Currently there is the distributed cache that enables you to cache jars and 
> files so that attempts from the same job can reuse them. However, sharing is 
> limited with the distributed cache because it is normally on a per-job basis. 
> On a large cluster, sometimes copying of jobjars and libjars becomes so 
> prevalent that it consumes a large portion of the network bandwidth, not to 
> speak of defeating the purpose of "bringing compute to where data is". This 
> is wasteful because in most cases code doesn't change much across many jobs.
> I'd like to propose and discuss feasibility of introducing a truly shared 
> cache so that multiple jobs from multiple users can share and cache jars. 
> This JIRA is to open the discussion.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844954#comment-13844954
 ] 

Sandy Ryza commented on YARN-1404:
--

bq. The thing is to enable only central scheduling, YARN has to give up its 
control over liveliness & enforcement and needs to create a new level of trust.
I'm not sure I entirely understand what you mean by create a new level of 
trust.  We are a long way from YARN managing all resources on a Hadoop cluster. 
 YARN implicitly understands that other trusted processes will be running 
alongside it.  The proposed change does not grant any users the ability to use 
any resources without going through a framework trusted by the cluster 
administrator.

bq. Like I said, we do have an implicit liveliness report - process liveliness. 
And NodeManager depends on that today to inform the app of container-finishes.
It depends on that or the AM releasing the resources.  Process liveliness is a 
very imperfect signifier - a process can stick around due to an 
accidentally-not-finished-thread even when all its work is done.  I have seen 
clusters where all MR task processes are killed by the AM without exiting 
naturally and everything works fine.

I've tried to think through situations where this could be harmful:
Malicious application intentionally sits on cluster resources: They can do this 
already by running a process with sleep(infinity)
Application unintentionally sits on cluster resources: This can already happen 
if a container process forgets to terminate a non-daemon thread.
In both cases, preemption will prohibit an application from sitting on 
resources above its fair share. 

Is there a scenario I'm missing here?

bq. If there are alternative architectures that will avoid losing that control, 
YARN will chose those options.
YARN is not a power-hungry conscious entity that gets to make decisions for us. 
 We as YARN committers and contributors get to decide what use cases we want to 
support, and we don't need to choose a single one.  We should of course be 
careful with what we choose to support, but we should be restrictive when there 
are concrete consequences of doing otherwise. Not simply when a use case 
violates the abstract idea of YARN controlling everything.

If the deeper concern is that Impala and similar frameworks will opt not to run 
fully inside YARN when that functionality is available, I think we would be 
happy to switch over when YARN supports this in a stable manner.  However, I 
believe this is a long way away and depending on that work is not an option for 
us.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impalad shares the host with other services (HDFS 
> DataNode, Yarn NodeManager, Hbase Region Server) and Yarn Applications 
> (MapReduce tasks).
> To ensure cluster utilization that follow the Yarn scheduler policies and it 
> does not overload the cluster nodes, before running a 'query fragment' in a 
> node, Impala requests the required amount of CPU and memory from Yarn. Once 
> the requested CPU and memory has been allocated, Impala starts running the 
> 'query fragment' taking care that the 'query fragment' does not use more 
> resource

[jira] [Commented] (YARN-1404) Enable external systems/frameworks to share resources with Hadoop leveraging Yarn resource scheduling

2013-12-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845011#comment-13845011
 ] 

Vinod Kumar Vavilapalli commented on YARN-1404:
---

bq. I'm not sure I entirely understand what you mean by create a new level of 
trust.
I thought that was already clear to everyone. See my comment 
[here|https://issues.apache.org/jira/browse/YARN-1404?focusedCommentId=13840905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13840905].
 "YARN depends on the ability to enforce resource-usage restrictions".

YARN enables both resource scheduling and enforcement of those scheduling 
decisions. If resources sit outside of YARN, YARN cannot enforce the limits on 
their usage. For e.g, YARN cannot enforce the memory usage of a datanode. 
People may work around it by setting up Cgroups on these daemons, but that 
defeats the purpose of YARN in the first place. That is why I earlier proposed 
that impala/datanode run under YARN. When I couldn't find a solution otherwise, 
I revised my proposal to restrict it to be used with a special ACL so that 
other apps don't abuse the cluster by requesting unmanaged containers and not 
using those resources.

bq. It depends on that or the AM releasing the resources. Process liveliness is 
a very imperfect signifier ...
We cannot trust AMs to always release containers. If it were so imperfect, we 
should change YARN as it is today to not depend on liveliness. I'd leave it as 
an exercise to see how, once we remove process-liveliness in general, apps will 
release containers and how clusters get utilized. Bonus points for trying it on 
a shared multi-tenant cluster with user-written YARN apps.

My point is that Process liveliness + accounting based on that is a very 
understood model in the Hadoop land. The proposal for leases is to continue 
that.

bq. Is there a scenario I'm missing here?
One example that illustrates this. Today AMs can go away without releasing 
containers and YARN can kill the corresponding containers(as they are managed). 
If we don't have some kind of leases, and AMs that are unmanaged resources go 
away without explicit container-release, those resources are leaked.

bq. YARN is not a power-hungry conscious entity that gets to make decisions for 
us. Not simply when a use case violates the abstract idea of YARN controlling 
everything. [...]
 Of course, when I mean YARN, I mean the YARN community. You take it too 
literally.

I was pointing out your statements about "Impala currently has little tangible 
to gain by doing deployment and enforcement inside YARN", "However, making 
Impala-YARN integration depend on this fairly involved work would unnecessarily 
set it back". YARN community doesn't take decisions based on those things.

Overall, I didn't originally have a complete solution for making it happen - so 
came up with ACLs, leases. But delegation as proposed by Arun seems like one 
that solves all the problems.  Other than saying you don't want to wait for 
impala-under-YARN integration, I haven't heard any technical reservations 
against this approach.

> Enable external systems/frameworks to share resources with Hadoop leveraging 
> Yarn resource scheduling
> -
>
> Key: YARN-1404
> URL: https://issues.apache.org/jira/browse/YARN-1404
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.2.0
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
> Attachments: YARN-1404.patch
>
>
> Currently Hadoop Yarn expects to manage the lifecycle of the processes its 
> applications run workload in. External frameworks/systems could benefit from 
> sharing resources with other Yarn applications while running their workload 
> within long-running processes owned by the external framework (in other 
> words, running their workload outside of the context of a Yarn container 
> process). 
> Because Yarn provides robust and scalable resource management, it is 
> desirable for some external systems to leverage the resource governance 
> capabilities of Yarn (queues, capacities, scheduling, access control) while 
> supplying their own resource enforcement.
> Impala is an example of such system. Impala uses Llama 
> (http://cloudera.github.io/llama/) to request resources from Yarn.
> Impala runs an impalad process in every node of the cluster, when a user 
> submits a query, the processing is broken into 'query fragments' which are 
> run in multiple impalad processes leveraging data locality (similar to 
> Map-Reduce Mappers processing a collocated HDFS block of input data).
> The execution of a 'query fragment' requires an amount of CPU and memory in 
> the impalad. As the impal

[jira] [Reopened] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-12-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reopened YARN-1121:
---


> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.4.0
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-12-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1121:
--

Attachment: YARN-1121.8.patch

Thanks for pointing out. Fixed the issue 

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.4.0
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845060#comment-13845060
 ] 

Hadoop QA commented on YARN-1121:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12618174/YARN-1121.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/2643//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/2643//console

This message is automatically generated.

> RMStateStore should flush all pending store events before closing
> -
>
> Key: YARN-1121
> URL: https://issues.apache.org/jira/browse/YARN-1121
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.1.0-beta
>Reporter: Bikas Saha
>Assignee: Jian He
> Fix For: 2.4.0
>
> Attachments: YARN-1121.1.patch, YARN-1121.2.patch, YARN-1121.2.patch, 
> YARN-1121.3.patch, YARN-1121.4.patch, YARN-1121.5.patch, YARN-1121.6.patch, 
> YARN-1121.6.patch, YARN-1121.7.patch, YARN-1121.8.patch
>
>
> on serviceStop it should wait for all internal pending events to drain before 
> stopping.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1311) Fix app specific scheduler-events' names to be app-attempt based

2013-12-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1311:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-1489

> Fix app specific scheduler-events' names to be app-attempt based
> 
>
> Key: YARN-1311
> URL: https://issues.apache.org/jira/browse/YARN-1311
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Trivial
> Attachments: YARN-1311-20131015.txt
>
>
> Today, APP_ADDED and APP_REMOVED are sent to the scheduler. They are 
> misnomers as schedulers only deal with AppAttempts today. This JIRA is for 
> fixing their names so that we can add App-level events in the near future, 
> notably for work-preserving RM-restart.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1493) Separate app-level handling logic in scheduler

2013-12-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1493:
--

Description: Today, scheduler is tied to attempt only. We can add new 
app-level events to the scheduler and separate the app-level logic out. This is 
good for work-preserving AM restart, RM restart, and also needed for 
differentiating app-level metrics and attempt-level metrics.

> Separate app-level handling logic in scheduler 
> ---
>
> Key: YARN-1493
> URL: https://issues.apache.org/jira/browse/YARN-1493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
>
> Today, scheduler is tied to attempt only. We can add new app-level events to 
> the scheduler and separate the app-level logic out. This is good for 
> work-preserving AM restart, RM restart, and also needed for differentiating 
> app-level metrics and attempt-level metrics.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (YARN-1493) Separate app-level handling logic in scheduler

2013-12-10 Thread Jian He (JIRA)
Jian He created YARN-1493:
-

 Summary: Separate app-level handling logic in scheduler 
 Key: YARN-1493
 URL: https://issues.apache.org/jira/browse/YARN-1493
 Project: Hadoop YARN
  Issue Type: Sub-task
 Environment: Today, scheduler is tied to attempt only. We can add new 
app-level events to the scheduler and separate the app-level logic out. This is 
good for work-preserving AM restart, RM restart, and also needed for 
differentiating app-level metrics and attempt-level metrics.
Reporter: Jian He
Assignee: Jian He






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1493) Separate app-level handling logic in scheduler

2013-12-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-1493:
--

Environment: (was: Today, scheduler is tied to attempt only. We can add 
new app-level events to the scheduler and separate the app-level logic out. 
This is good for work-preserving AM restart, RM restart, and also needed for 
differentiating app-level metrics and attempt-level metrics.)

> Separate app-level handling logic in scheduler 
> ---
>
> Key: YARN-1493
> URL: https://issues.apache.org/jira/browse/YARN-1493
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jian He
>Assignee: Jian He
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1489) [Umbrella] Work-preserving ApplicationMaster restart

2013-12-10 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845124#comment-13845124
 ] 

Bikas Saha commented on YARN-1489:
--

Would be good to see an overall design document, specially for the tricky 
pieces like reconnecting existing running containers to new app attempts.

> [Umbrella] Work-preserving ApplicationMaster restart
> 
>
> Key: YARN-1489
> URL: https://issues.apache.org/jira/browse/YARN-1489
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>
> Today if AMs go down,
>  - RM kills all the containers of that ApplicationAttempt
>  - New ApplicationAttempt doesn't know where the previous containers are 
> running
>  - Old running containers don't know where the new AM is running.
> We need to fix this to enable work-preserving AM restart. The later two 
> potentially can be done at the app level, but it is good to have a common 
> solution for all apps where-ever possible.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking

2013-12-10 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1363:
--

Attachment: YARN-1363.2.patch

I've drafted an initial patch which only contains the production code. Here're 
some important changes:

1. storing DT and removing DT are changed to be async in RMStateStore, and 
notify RMDelegationTokenSecretManager of operation completion.

2. updating DT is added to RMStateStore, such that RMStateStore can send 
separate update completion notification, not to confused with storing/removing 
completion notifications.

3. RMDelegationTokenSecretManager handles the completion nofitications from 
RMStateStore

4. RMStateStore maintains a map of outstanding DT operations.

5. ClientRMService are changed to whether the operation is still in progress or 
not, and poll the result only when the operation is finished.

6. Update the javadoc in ApplicationClientProtocol

7. Update the YarnClientImpl according. One finding in YarnClient is that 
canceling/renewing DT are not wrapped.

> Get / Cancel / Renew delegation token api should be non blocking
> 
>
> Key: YARN-1363
> URL: https://issues.apache.org/jira/browse/YARN-1363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Zhijie Shen
> Attachments: YARN-1363.1.patch, YARN-1363.2.patch
>
>
> Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are 
> all blocking apis.
> * As a part of these calls we try to update RMStateStore and that may slow it 
> down.
> * Now as we have limited number of client request handlers we may fill up 
> client handlers quickly.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (YARN-1363) Get / Cancel / Renew delegation token api should be non blocking

2013-12-10 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845165#comment-13845165
 ] 

Zhijie Shen commented on YARN-1363:
---

8. RMDelegationTokenSecretManager has the cleanup thread to clean the 
outstanding DT operations that have been finished, but the results have never 
been polled by the client, which is possible if the client crashes.

> Get / Cancel / Renew delegation token api should be non blocking
> 
>
> Key: YARN-1363
> URL: https://issues.apache.org/jira/browse/YARN-1363
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Omkar Vinit Joshi
>Assignee: Zhijie Shen
> Attachments: YARN-1363.1.patch, YARN-1363.2.patch
>
>
> Today GetDelgationToken, CancelDelegationToken and RenewDelegationToken are 
> all blocking apis.
> * As a part of these calls we try to update RMStateStore and that may slow it 
> down.
> * Now as we have limited number of client request handlers we may fill up 
> client handlers quickly.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)