[jira] [Commented] (YARN-2716) Refactor ZKRMStateStore retry code with Apache Curator

2015-06-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573911#comment-14573911
 ] 

Jian He commented on YARN-2716:
---

Thanks Karthik for working this ! This simplifies things a lot. 
Mostly good, few comments and questions:
- these two booleans not used, maybe removed.
{{private boolean create = false, delete = false;
}}
- is this going to be done in this jira?
{code} // TODO: Check deleting appIdRemovePath works recursively
safeDelete(appIdRemovePath);{code}
- will the safeDelete throw noNodeExist exception if deleting a non-existing 
zone?
- {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}},  I think 
the second parameter should be zkRetryInterval; Also, I have a question why in 
HA case, zkRetryInterval is calculated as below
{code}
if (HAUtil.isHAEnabled(conf)) {
  zkRetryInterval = zkSessionTimeout / 
numRetries;
{code}

- I found this 
[thread|http://mail-archives.apache.org/mod_mbox/curator-user/201410.mbox/%3cd076bc8e.9ef1%25sreichl...@chegg.com%3E]
 saying that blockUntilConnect is not needed to call; Suppose it’s needed, I 
think the zkSessionTimeout value is too small, it would be 
numRetries*numRetryInterval, otherwise RM will exit soon after retry 10s by 
default.
{code}
if (!curatorFramework.blockUntilConnected(
zkSessionTimeout, TimeUnit.MILLISECONDS)) {
  LOG.fatal("Couldn't establish connection to ZK server");
  throw new YarnRuntimeException("Couldn't connect to ZK server");
}
{code}
- remove this ?
{code}
//  @Override
//  public ZooKeeper getNewZooKeeper() throws IOException {
//return client;
//  }
{code}
-  I think testZKSessionTimeout may be removed too ? it looks like a test for 
curator 


> Refactor ZKRMStateStore retry code with Apache Curator
> --
>
> Key: YARN-2716
> URL: https://issues.apache.org/jira/browse/YARN-2716
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Karthik Kambatla
> Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, 
> yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
>
>
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3508) Preemption processing occuring on the main RM dispatcher

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573878#comment-14573878
 ] 

Wangda Tan commented on YARN-3508:
--

Trying to better understand this problem: I'm not sure where is bottleneck. If 
CapacityScheduler becomes bottleneck, move preemption events out of main RM 
dispatcher doesn't help. This approach only helps when main dispatcher is 
bottleneck.

And a parallel thing we can do is to optimize number of preemption event. 
Currently, if a container sits in to-preempt list, before it is get preempted, 
one event will be sent to scheduler for every few seconds, we can reduce 
frequency of this event.

> Preemption processing occuring on the main RM dispatcher
> 
>
> Key: YARN-3508
> URL: https://issues.apache.org/jira/browse/YARN-3508
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, scheduler
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
> Attachments: YARN-3508.002.patch, YARN-3508.01.patch
>
>
> We recently saw the RM for a large cluster lag far behind on the 
> AsyncDispacher event queue.  The AsyncDispatcher thread was consistently 
> blocked on the highly-contended CapacityScheduler lock trying to dispatch 
> preemption-related events for RMContainerPreemptEventDispatcher.  Preemption 
> processing should occur on the scheduler event dispatcher thread or a 
> separate thread to avoid delaying the processing of other events in the 
> primary dispatcher queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573830#comment-14573830
 ] 

Zhijie Shen commented on YARN-3051:
---

[~varun_saxena], thanks for working on the new patch. It seems to be a complete 
reader side protype, which is nice. I still need some time to take thorough 
look, but I'd like to my thoughts about the reader APIs.

IMHO, we may want to have or start with two sets of APIs: 1) the APIs to query 
the raw data and 2) the APIs to query the aggregation data.

1) APIs to query the raw data:

We would like to have the APIs to let users zoom into the details about their 
jobs, and give users the freedom to fetch the raw data and do the customized 
process that ATS will not do. For example, Hive/Pig on Tez need this set of 
APIs to get the framework specific data, process it and render it on their on 
web UI. We basically need 2 such APIs.

a. Get a single entity given an ID that uniquely locates the entity in the 
backend (We assume the uniqueness is assured somehow). 
* This API can be extended or split into multiple sub-APIs to get a single 
element of the entity, such as events, metrics and configuration.

b. Search for a set entities that match the given predicates.
* We can start from the predicates that we used in ATS v1 (also for the 
compatibility purpose), but some of them may no longer apply.
* We may want to add more predicates to check the newly added element in v2.
* With more predefined semantics, we can even query entities that belong to 
some container/attempt/application and so on.

2) APIs to query the aggregation data

These are complete new in v2 and are the advantage. With the aggregation, we 
can answer some statistical questions about the job, the user, the queue, the 
flow and the cluster. These APIs are not directing users to the individual 
entities put by the application, but returning statistical data (carried by 
Application|User|Queue|Flow|ClusterEntity). 

a. Get certain level aggregation data given the ID of the concept on that 
level, i.e.,  the job, the user, the queue, the flow and the cluster.

b. Search for the the jobs, the users, the queues, the flows and the clusters 
given predicates.
* For the predicates, we could learn from the examples in hRaven.


> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051-YARN-2928.003.patch, 
> YARN-3051-YARN-2928.03.patch, YARN-3051.wip.02.YARN-2928.patch, 
> YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3706) Generalize native HBase writer for additional tables

2015-06-04 Thread Joep Rottinghuis (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated YARN-3706:
---
Attachment: YARN-3726-YARN-2928.005.patch

Uploading YARN-3726-YARN-2928.005.patch

Added proper encoding and decoding of column names and values where a splitter 
is used. We now also encode spaces in the column names, and properly decode 
them on the way out.

Fixed TestHBaseTimelineWriterImpl to confirm that configs now properly work as 
well.
Still need to add reading of metrics, fix a unit test for join (with null as 
separator) of the older join method, and add a entity reader that creates an 
entire entity object from a scan result.

> Generalize native HBase writer for additional tables
> 
>
> Key: YARN-3706
> URL: https://issues.apache.org/jira/browse/YARN-3706
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Joep Rottinghuis
>Assignee: Joep Rottinghuis
>Priority: Minor
> Attachments: YARN-3706-YARN-2928.001.patch, 
> YARN-3726-YARN-2928.002.patch, YARN-3726-YARN-2928.003.patch, 
> YARN-3726-YARN-2928.004.patch, YARN-3726-YARN-2928.005.patch
>
>
> When reviewing YARN-3411 we noticed that we could change the class hierarchy 
> a little in order to accommodate additional tables easily.
> In order to get ready for benchmark testing we left the original layout in 
> place, as performance would not be impacted by the code hierarchy.
> Here is a separate jira to address the hierarchy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3453) Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator even in DRF mode causing thrashing

2015-06-04 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573754#comment-14573754
 ] 

Karthik Kambatla commented on YARN-3453:


Few comments:
# New imports in FairScheduler and FSLeafQueue are not required.
# Looking at the remaining uses of DefaultResourceCalculator in FairScheduler, 
we could benefit from updating all of them to DominantResourceCalculator? 
[~ashwinshankar77] - do you concur? 
# In FairScheduler, changing the scope of RESOURCE_CALCULATOR and 
DOMINANT_RESOURCE_CALCULATOR is not required.
# We should add unit-tests to avoid regressions in the future. 
# Nit: In each of the policies, my preference would be not make the calculator 
and comparator members static unless required. We have had cases where our 
tests would invoke multiple instances of the class leading to issues. Not that 
I foresee multiple instantiations for these classes, but would like to avoid it 
if we can.

> Fair Scheduler : Parts of preemption logic uses DefaultResourceCalculator 
> even in DRF mode causing thrashing
> 
>
> Key: YARN-3453
> URL: https://issues.apache.org/jira/browse/YARN-3453
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Ashwin Shankar
>Assignee: Arun Suresh
> Attachments: YARN-3453.1.patch, YARN-3453.2.patch
>
>
> There are two places in preemption code flow where DefaultResourceCalculator 
> is used, even in DRF mode.
> Which basically results in more resources getting preempted than needed, and 
> those extra preempted containers aren’t even getting to the “starved” queue 
> since scheduling logic is based on DRF's Calculator.
> Following are the two places :
> 1. {code:title=FSLeafQueue.java|borderStyle=solid}
> private boolean isStarved(Resource share)
> {code}
> A queue shouldn’t be marked as “starved” if the dominant resource usage
> is >=  fair/minshare.
> 2. {code:title=FairScheduler.java|borderStyle=solid}
> protected Resource resToPreempt(FSLeafQueue sched, long curTime)
> {code}
> --
> One more thing that I believe needs to change in DRF mode is : during a 
> preemption round,if preempting a few containers results in satisfying needs 
> of a resource type, then we should exit that preemption round, since the 
> containers that we just preempted should bring the dominant resource usage to 
> min/fair share.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu reassigned YARN-3768:
---

Assignee: zhihai xu

> Index out of range exception with environment variables without values
> --
>
> Key: YARN-3768
> URL: https://issues.apache.org/jira/browse/YARN-3768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.5.0
>Reporter: Joe Ferner
>Assignee: zhihai xu
>
> Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
> exception occurs if an environment variable is encountered without a value.
> I believe this occurs because java will not return empty strings from the 
> split method. Similar to this 
> http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573745#comment-14573745
 ] 

zhihai xu commented on YARN-3745:
-

Sorry, there's one more thing I forgot to mention, Can we rename 
{{initExceptionWithConstructor}} to instantiateExceptionImpl?

> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3745) SerializedException should also try to instantiate internal exception with the default constructor

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573724#comment-14573724
 ] 

zhihai xu commented on YARN-3745:
-

[~lavkesh], thanks for working on this issue. This looks like a good catch.
One question about the patch, why retrying on SecurityException? Will retrying 
on NoSuchMethodException be enough?
If need retrying on SecurityException, Can we add a test case against it?
There is a typo in the comment {{This does not has constructor with String 
argument}}, should be {{have}} instead of {{has}}.
Also could we make the comment {{Try with String constructor if it fails try 
with default.}} clearer as
{{Try constructor with String argument, if it fails, try default.}}
Can we add some comment to explain why ClassNotFoundException is expected in 
the test?


> SerializedException should also try to instantiate internal exception with 
> the default constructor
> --
>
> Key: YARN-3745
> URL: https://issues.apache.org/jira/browse/YARN-3745
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Lavkesh Lahngir
>Assignee: Lavkesh Lahngir
> Attachments: YARN-3745.1.patch, YARN-3745.patch
>
>
> While deserialising a SerializedException it tries to create internal 
> exception in instantiateException() with cn = 
> cls.getConstructor(String.class).
> if cls does not has a constructor with String parameter it throws 
> Nosuchmethodexception
> for example ClosedChannelException class.  
> We should also try to instantiate exception with default constructor so that 
> inner exception can to propagated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573673#comment-14573673
 ] 

Wangda Tan commented on YARN-3769:
--

Thanks [~eepayne], I reassigned it to me, I will upload a design doc shortly 
for review.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-3769:


Assignee: Wangda Tan  (was: Eric Payne)

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Wangda Tan
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573670#comment-14573670
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy]
bq. If you think it's fine, could I take a shot at it?
It sounds like it would work. It's fine with me if you want to work on that.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573667#comment-14573667
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne], Exactly.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573664#comment-14573664
 ] 

Eric Payne commented on YARN-3769:
--

[~leftnoteasy],
{quote}
One thing I've thought for a while is adding a "lazy preemption" mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a "can_be_killed" container. If there's 
another queue can allocate on a node with "can_be_killed" container, such 
container will be killed immediately to make room the new containers.
{quote}
IIUC, in your proposal, the preemption monitor would mark the containers as 
preemptable, and then after some configurable wait period, the capacity 
scheduler would be the one to do the killing if it finds that it needs the 
resources on that node. Is my understanding correct?

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573659#comment-14573659
 ] 

Hudson commented on YARN-3766:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7971 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7971/])
YARN-3766. Fixed the apps table column error of generic history web UI. 
Contributed by Xuan Gong. (zjshen: rev 18dd01d6bf67f4d522b947454c1f4347d1cbbc19)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AHSView.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java


> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573638#comment-14573638
 ] 

Wangda Tan commented on YARN-3769:
--

[~eepayne],
This is a very interesting problem, actually not only user-limit causes it.

For example, fair ordering (YARN-3306), hard locality requirements (I want 
resources from rackA and nodeX only), AM resource limit; In the near future we 
can have constraints (YARN-3409), all can lead to resource is preempted from 
one queue, but the other queue cannot use it because of specific resource 
requirement and limits.

One thing I've thought for a while is adding a "lazy preemption" mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a "can_be_killed" container. If there's 
another queue can allocate on a node with "can_be_killed" container, such 
container will be killed immediately to make room the new containers.

This mechanism can make preemption policy doesn't need to consider complex 
resource requirements and limits inside a queue, and also it can avoid kill 
unnecessary containers.

If you think it's fine, could I take a shot at it?

Thoughts? [~vinodkv].

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3766) ATS Web UI breaks because of YARN-3467

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573633#comment-14573633
 ] 

Zhijie Shen commented on YARN-3766:
---

Patch looks good. Tried it locally and the web UI has been fixed. Will commit 
it.

> ATS Web UI breaks because of YARN-3467
> --
>
> Key: YARN-3766
> URL: https://issues.apache.org/jira/browse/YARN-3766
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, webapp, yarn
>Affects Versions: 2.8.0
>Reporter: Xuan Gong
>Assignee: Xuan Gong
>Priority: Blocker
> Attachments: ATSWebPageBreaks.png, YARN-3766.1.patch
>
>
> The ATS web UI breaks because of the following changes made in YARN-3467.
> {code}
> +++ 
> hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java
> @@ -52,9 +52,9 @@ private static String getAppsTableColumnDefs(
>.append(", 'mRender': renderHadoopDate }")
>.append("\n, {'sType':'numeric', bSearchable:false, 'aTargets':");
>  if (isFairSchedulerPage) {
> -  sb.append("[11]");
> +  sb.append("[13]");
>  } else if (isResourceManager) {
> -  sb.append("[10]");
> +  sb.append("[12]");
>  } else {
>sb.append("[9]");
>  }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573619#comment-14573619
 ] 

Eric Payne commented on YARN-3769:
--

The following configuration will cause this:

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 40 | 90 | N/A |
| A | 10 | 100 | 20 | 70 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

One app is running in each queue. Both apps are asking for more resources, but 
they have each reached their user limit, so even though both are asking for 
more and there are resources available, no more resources are allocated to 
either app.

The preemption monitor will see that {{B}} is asking for a lot more resources, 
and it will see that {{B}} is more underserved than {{A}}, so the preemption 
monitor will try to make the queues balance by preempting resources (10, for 
example) from {{A}}.

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 50 | 80 | N/A |
| A | 10 | 100 | 30 | 60 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

However, when the capacity scheduler tries to give that container to the app in 
{{B}}, the app will recognize that it has no headroom, and refuse the 
container. So the capacity scheduler offers the container again to the app in 
{{A}}, which accepts it because it has headroom now, and the process starts 
over again.

Note that this happens even when used cluster resources are below 100% because 
the used + pending for the cluster would put it above 100%.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> -
>
> Key: YARN-3769
> URL: https://issues.apache.org/jira/browse/YARN-3769
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0, 2.7.0, 2.8.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3769) Preemption occurring unnecessarily because preemption doesn't consider user limit

2015-06-04 Thread Eric Payne (JIRA)
Eric Payne created YARN-3769:


 Summary: Preemption occurring unnecessarily because preemption 
doesn't consider user limit
 Key: YARN-3769
 URL: https://issues.apache.org/jira/browse/YARN-3769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.7.0, 2.6.0, 2.8.0
Reporter: Eric Payne
Assignee: Eric Payne


We are seeing the preemption monitor preempting containers from queue A and 
then seeing the capacity scheduler giving them immediately back to queue A. 
This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573593#comment-14573593
 ] 

zhihai xu commented on YARN-3017:
-

Hi [~rohithsharma], thanks for the information.
Sorry, I am not familiar with rolling upgrade, Could you give a little more 
detail about the possibility to break the rolling upgrade?
But I saw the ContainerId format is changed by YARN-2562 at 2.6.0 release eight 
months ago, Compared to the change at YARN-2562, this patch is minor. Because 
it only changes function {{ContainerId#toString}}, the current function 
{{ContainerId##fromString}} supports both the current container string format 
and the new container string format.
CC [~ozawa] for the impact of ContainerId format change.

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573539#comment-14573539
 ] 

zhihai xu commented on YARN-3768:
-

Hi [~joeferner], That is a good find. I can see the change at MAPREDUCE-5965 
may trigger this bug. I can take up this issue if you don't mind. thanks for 
reporting this issue.

> Index out of range exception with environment variables without values
> --
>
> Key: YARN-3768
> URL: https://issues.apache.org/jira/browse/YARN-3768
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.5.0
>Reporter: Joe Ferner
>
> Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
> exception occurs if an environment variable is encountered without a value.
> I believe this occurs because java will not return empty strings from the 
> split method. Similar to this 
> http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3044) [Event producers] Implement RM writing app lifecycle events to ATS

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573515#comment-14573515
 ] 

Zhijie Shen commented on YARN-3044:
---

I'm not sure because as far as I can tell, NM's impl is different from RM's, 
but it's up to you to figure out the proper solution:-)

> [Event producers] Implement RM writing app lifecycle events to ATS
> --
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Naganarasimha G R
> Attachments: YARN-3044-YARN-2928.004.patch, 
> YARN-3044-YARN-2928.005.patch, YARN-3044-YARN-2928.006.patch, 
> YARN-3044-YARN-2928.007.patch, YARN-3044-YARN-2928.008.patch, 
> YARN-3044-YARN-2928.009.patch, YARN-3044-YARN-2928.010.patch, 
> YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch, 
> YARN-3044.20150416-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573512#comment-14573512
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7970 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7970/])
Add missing test file of YARN-3733 (wangda: rev 
405bbcf68c32d8fd8a83e46e686eacd14e5a533c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/util/resource/TestResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3768) Index out of range exception with environment variables without values

2015-06-04 Thread Joe Ferner (JIRA)
Joe Ferner created YARN-3768:


 Summary: Index out of range exception with environment variables 
without values
 Key: YARN-3768
 URL: https://issues.apache.org/jira/browse/YARN-3768
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.5.0
Reporter: Joe Ferner


Looking at line 80 of org.apache.hadoop.yarn.util.Apps an index out of range 
exception occurs if an environment variable is encountered without a value.

I believe this occurs because java will not return empty strings from the split 
method. Similar to this 
http://stackoverflow.com/questions/14602062/java-string-split-removed-empty-values



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2015-06-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573468#comment-14573468
 ] 

kassiano josé matteussi commented on YARN-2139:
---

Dears, 

I have studied resource management under Hadoop applications running wrapped in 
Linux containers and I have faced troubles to restrict disk I/O with cgroups 
(bps_write, bps_read). 

Does anybody know if it is possible to do so?

I have heard that limiting I/O with cgroups is restricted to synchronous 
writing (SYNC) and that is why it wouldn't work well with Hadoop + HDFS. Is 
this still true in more recent kernel implementation?

Best Regards,
Kassiano

> [Umbrella] Support for Disk as a Resource in YARN 
> --
>
> Key: YARN-2139
> URL: https://issues.apache.org/jira/browse/YARN-2139
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Wei Yan
> Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
> Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
> YARN-2139-prototype-2.patch, YARN-2139-prototype.patch
>
>
> YARN should consider disk as another resource for (1) scheduling tasks on 
> nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573325#comment-14573325
 ] 

Hudson commented on YARN-2392:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7968 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7968/])
YARN-2392. Add more diags about app retry limits on AM failures. Contributed by 
Steve Loughran (jianhe: rev 1970ca7cbcdb7efa160d0cedc2e3e22c1401fad6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* hadoop-yarn-project/CHANGES.txt


> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3767:
---
Assignee: (was: Varun Saxena)

> Yarn Scheduler Load Simulator does not work
> ---
>
> Key: YARN-3767
> URL: https://issues.apache.org/jira/browse/YARN-3767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: OS X 10.10.  JDK 1.7
>Reporter: David Kjerrumgaard
>
> Running the SLS, as per the instructions on the web results in a 
> NullPointerException being thrown.
> Steps followed to create error:
> 1) Download Apache Hadoop 2.7.0 tarball from Apache site
> 2) Untar 2.7.0 tarball into /opt directory
> 3) Execute the following command: 
> /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
> --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
>  --output-dir=/tmp
> Results in the following error:
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2118.smile.com:2 clusterResource: 
> 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
> /default-rack
> 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
> from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
> , assigned nodeId a2115.smile.com:3
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2115.smile.com:3 clusterResource: 
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>   at 
> org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
>   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
>   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
>   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
>   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573293#comment-14573293
 ] 

Jian He commented on YARN-2392:
---

looks good, committing

> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps & correct node-manager redirection

2015-06-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated YARN-3698:
-
Summary: Make task attempt log files accessible from webapps & correct 
node-manager redirection  (was: Make task attempt log files accessible from 
webapps)

> Make task attempt log files accessible from webapps & correct node-manager 
> redirection
> --
>
> Key: YARN-3698
> URL: https://issues.apache.org/jira/browse/YARN-3698
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Sreenath Somarajapuram
>
> Currently we don't have direct access to an attempt's log file from web apps. 
> The only available option is through jobhistory, and that provides an HTML 
> view of the log.
> Requirements:
> # A link to access the raw log file.
> # A variant of the link with the following headers set, this enables direct 
> download of the file across all browsers.
> Content-Disposition: attachment; filename="attempt-id.log"
> Content-Type of text/plain
> # Node manager redirects an attempt syslog view to the container view. Hence 
> we are not able to view the logs of a specific attempt.
> Before redirection: 
> http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
> After redirection: 
> http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3698) Make task attempt log files accessible from webapps

2015-06-04 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated YARN-3698:
-
Description: 
Currently we don't have direct access to an attempt's log file from web apps. 
The only available option is through jobhistory, and that provides an HTML view 
of the log.

Requirements:
# A link to access the raw log file.
# A variant of the link with the following headers set, this enables direct 
download of the file across all browsers.
Content-Disposition: attachment; filename="attempt-id.log"
Content-Type of text/plain
# Node manager redirects an attempt syslog view to the container view. Hence we 
are not able to view the logs of a specific attempt.
Before redirection: 
http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
After redirection: 
http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root

  was:
Currently we don't have direct access to an attempt's log file from web apps. 
The only available option is through jobhistory, and that provides an HTML view 
of the log.

Requirements:
# A link to access the raw log file.
# A variant of the link with the following headers set, this enables direct 
download of the file across all browsers.
Content-Disposition: attachment; filename="attempt-id.log"
Content-Type of text/plain


> Make task attempt log files accessible from webapps
> ---
>
> Key: YARN-3698
> URL: https://issues.apache.org/jira/browse/YARN-3698
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Sreenath Somarajapuram
>
> Currently we don't have direct access to an attempt's log file from web apps. 
> The only available option is through jobhistory, and that provides an HTML 
> view of the log.
> Requirements:
> # A link to access the raw log file.
> # A variant of the link with the following headers set, this enables direct 
> download of the file across all browsers.
> Content-Disposition: attachment; filename="attempt-id.log"
> Content-Type of text/plain
> # Node manager redirects an attempt syslog view to the container view. Hence 
> we are not able to view the logs of a specific attempt.
> Before redirection: 
> http://sandbox.hortonworks.com:8042/node/containerlogs/container_1432048982252_0004_01_02/root/syslog_attempt_1432048982252_0004_1_02_00_0
> After redirection: 
> http://sandbox.hortonworks.com:19888/jobhistory/logs/sandbox.hortonworks.com:45454/container_1432048982252_0004_01_02/container_1432048982252_0004_01_02/root



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-06-04 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573288#comment-14573288
 ] 

Zhijie Shen commented on YARN-2513:
---

As it's valuable to some existing ATS use case, let's try to get the patch in 
and target 2.8.

[~jeagles], three comments about the patch:

1. Shall we add "yarn.timeline-service.ui-names" to yarn-default.xml too? Like 
"yarn.nodemanager.aux-services"?

2. Can we add some text in TimelineServer.md to document the configs and 
introduce how to install framework UIs.

3. Can we add a test case to validate and showcase that ATS can load a 
framework UIs (e.g., a single helloworld.html)?

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3764) CapacityScheduler should forbid moving LeafQueue from one parent to another

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573276#comment-14573276
 ] 

Hudson commented on YARN-3764:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7966 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7966/])
YARN-3764. CapacityScheduler should forbid moving LeafQueue from one parent to 
another. Contributed by Wangda Tan (jianhe: rev 
6ad4e59cfc111a92747fdb1fb99cc6378044832a)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestQueueParsing.java


> CapacityScheduler should forbid moving LeafQueue from one parent to another
> ---
>
> Key: YARN-3764
> URL: https://issues.apache.org/jira/browse/YARN-3764
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: YARN-3764.1.patch
>
>
> Currently CapacityScheduler doesn't handle the case well, for example:
> A queue structure:
> {code}
> root
>   |
>   a (100)
> /   \
>x y
>   (50)   (50)
> {code}
> And reinitialize using following structure:
> {code}
>  root
>  /   \ 
> (50)a x (50)
> |
> y
>(100)
> {code}
> The actual queue structure after reinitialize is:
> {code}
>  root
> /\
>a (50) x (50)
>   /  \
>  xy
> (50)  (100)
> {code}
> We should forbid admin doing that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573273#comment-14573273
 ] 

Varun Saxena commented on YARN-3767:


Yes it will work if you copy {{sls-runner.xml}} to {{etc/hadoop}}. This is 
mentioned in documentation as well. Refer to : 
http://hadoop.apache.org/docs/r2.4.1/hadoop-sls/SchedulerLoadSimulator.html#Step_1:_Configure_Hadoop_and_the_simulator

It mentions "Before we start, make sure Hadoop and the simulator are configured 
well. All configuration files for Hadoop and the simulator should be placed in 
directory $HADOOP_ROOT/etc/hadoop, where the ResourceManager and Yarn scheduler 
load their configurations. Directory 
$HADOOP_ROOT/share/hadoop/tools/sls/sample-conf/ provides several example 
configurations, that can be used to start a demo."

> Yarn Scheduler Load Simulator does not work
> ---
>
> Key: YARN-3767
> URL: https://issues.apache.org/jira/browse/YARN-3767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: OS X 10.10.  JDK 1.7
>Reporter: David Kjerrumgaard
>Assignee: Varun Saxena
>
> Running the SLS, as per the instructions on the web results in a 
> NullPointerException being thrown.
> Steps followed to create error:
> 1) Download Apache Hadoop 2.7.0 tarball from Apache site
> 2) Untar 2.7.0 tarball into /opt directory
> 3) Execute the following command: 
> /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
> --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
>  --output-dir=/tmp
> Results in the following error:
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2118.smile.com:2 clusterResource: 
> 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
> /default-rack
> 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
> from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
> , assigned nodeId a2115.smile.com:3
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2115.smile.com:3 clusterResource: 
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>   at 
> org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
>   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
>   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
>   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
>   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573269#comment-14573269
 ] 

Varun Saxena commented on YARN-3767:


This belongs to YARN.

> Yarn Scheduler Load Simulator does not work
> ---
>
> Key: YARN-3767
> URL: https://issues.apache.org/jira/browse/YARN-3767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: OS X 10.10.  JDK 1.7
>Reporter: David Kjerrumgaard
>Assignee: Varun Saxena
>
> Running the SLS, as per the instructions on the web results in a 
> NullPointerException being thrown.
> Steps followed to create error:
> 1) Download Apache Hadoop 2.7.0 tarball from Apache site
> 2) Untar 2.7.0 tarball into /opt directory
> 3) Execute the following command: 
> /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
> --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
>  --output-dir=/tmp
> Results in the following error:
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2118.smile.com:2 clusterResource: 
> 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
> /default-rack
> 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
> from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
> , assigned nodeId a2115.smile.com:3
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2115.smile.com:3 clusterResource: 
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>   at 
> org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
>   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
>   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
>   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
>   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (YARN-3767) Yarn Scheduler Load Simulator does not work

2015-06-04 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena moved HADOOP-12062 to YARN-3767:
-

  Component/s: (was: tools)
Affects Version/s: (was: 2.7.0)
   2.7.0
  Key: YARN-3767  (was: HADOOP-12062)
  Project: Hadoop YARN  (was: Hadoop Common)

> Yarn Scheduler Load Simulator does not work
> ---
>
> Key: YARN-3767
> URL: https://issues.apache.org/jira/browse/YARN-3767
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: OS X 10.10.  JDK 1.7
>Reporter: David Kjerrumgaard
>Assignee: Varun Saxena
>
> Running the SLS, as per the instructions on the web results in a 
> NullPointerException being thrown.
> Steps followed to create error:
> 1) Download Apache Hadoop 2.7.0 tarball from Apache site
> 2) Untar 2.7.0 tarball into /opt directory
> 3) Execute the following command: 
> /opt/hadoop-2.7.0/share/hadoop/tools/sls//bin/slsrun.sh 
> --input-rumen=/opt/hadoop-2.7.0/share/hadoop/tools/sls/sample-data/2jobs2min-rumen-jh.json
>  --output-dir=/tmp
> Results in the following error:
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2118.smile.com:2 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2118.smile.com:2 clusterResource: 
> 15/06/04 10:25:41 INFO util.RackResolver: Resolved a2115.smile.com to 
> /default-rack
> 15/06/04 10:25:41 INFO resourcemanager.ResourceTrackerService: NodeManager 
> from node a2115.smile.com(cmPort: 3 httpPort: 80) registered with capability: 
> , assigned nodeId a2115.smile.com:3
> 15/06/04 10:25:41 INFO rmnode.RMNodeImpl: a2115.smile.com:3 Node Transitioned 
> from NEW to RUNNING
> 15/06/04 10:25:41 INFO capacity.CapacityScheduler: Added node 
> a2115.smile.com:3 clusterResource: 
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
>   at 
> org.apache.hadoop.yarn.sls.SLSRunner.startAMFromRumenTraces(SLSRunner.java:398)
>   at org.apache.hadoop.yarn.sls.SLSRunner.startAM(SLSRunner.java:250)
>   at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:145)
>   at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528)
> Caused by: java.lang.NullPointerException
>   at 
> java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
>   at 
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:988)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:126)
>   ... 4 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS

2015-06-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2513:
--
Target Version/s: 2.8.0

> Host framework UIs in YARN for use with the ATS
> ---
>
> Key: YARN-2513
> URL: https://issues.apache.org/jira/browse/YARN-2513
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, 
> YARN-2513.v3.patch
>
>
> Allow for pluggable UIs as described by TEZ-8. Yarn can provide the 
> infrastructure to host java script and possible java UIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573202#comment-14573202
 ] 

Hudson commented on YARN-3733:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7965 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7965/])
YARN-3733. Fix DominantRC#compare() does not work as expected if cluster 
resource is empty. (Rohith Sharmaks via wangda) (wangda: rev 
ebd797c48fe236b404cf3a125ac9d1f7714e291e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.java


> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573182#comment-14573182
 ] 

Wangda Tan commented on YARN-3733:
--

Great! Committing...

> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3733) Fix DominantRC#compare() does not work as expected if cluster resource is empty

2015-06-04 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-3733:
-
Summary: Fix DominantRC#compare() does not work as expected if cluster 
resource is empty  (was: DominantRC#compare() does not work as expected if 
cluster resource is empty)

> Fix DominantRC#compare() does not work as expected if cluster resource is 
> empty
> ---
>
> Key: YARN-3733
> URL: https://issues.apache.org/jira/browse/YARN-3733
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.0
> Environment: Suse 11 Sp3 , 2 NM , 2 RM
> one NM - 3 GB 6 v core
>Reporter: Bibin A Chundatt
>Assignee: Rohith
>Priority: Blocker
> Attachments: 0001-YARN-3733.patch, 0002-YARN-3733.patch, 
> 0002-YARN-3733.patch, YARN-3733.patch
>
>
> Steps to reproduce
> =
> 1. Install HA with 2 RM 2 NM (3072 MB * 2 total cluster)
> 2. Configure map and reduce size to 512 MB  after changing scheduler minimum 
> size to 512 MB
> 3. Configure capacity scheduler and AM limit to .5 
> (DominantResourceCalculator is configured)
> 4. Submit 30 concurrent task 
> 5. Switch RM
> Actual
> =
> For 12 Jobs AM gets allocated and all 12 starts running
> No other Yarn child is initiated , *all 12 Jobs in Running state for ever*
> Expected
> ===
> Only 6 should be running at a time since max AM allocated is .5 (3072 MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573034#comment-14573034
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573039#comment-14573039
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573043#comment-14573043
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573045#comment-14573045
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573040#comment-14573040
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYSCRPCFactories.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/ha

[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573041#comment-14573041
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2164 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2164/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573011#comment-14573011
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573013#comment-14573013
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573017#comment-14573017
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573006#comment-14573006
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573015#comment-14573015
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573012#comment-14573012
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #216 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/216/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeStatusUpdaterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/h

[jira] [Updated] (YARN-2573) Integrate ReservationSystem with the RM failover mechanism

2015-06-04 Thread Anubhav Dhoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Dhoot updated YARN-2573:

Attachment: Design for Reservation HA.pdf

Attaching design for the umbrella jira 

> Integrate ReservationSystem with the RM failover mechanism
> --
>
> Key: YARN-2573
> URL: https://issues.apache.org/jira/browse/YARN-2573
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler, fairscheduler, resourcemanager
>Reporter: Subru Krishnan
>Assignee: Subru Krishnan
> Attachments: Design for Reservation HA.pdf
>
>
> YARN-1051 introduces the ReservationSystem and the current implementation is 
> completely in-memory based. YARN-149 brings in the notion of RM HA with a 
> highly available state store. This JIRA proposes persisting the Plan into the 
> RMStateStore and recovering it post RM failover



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572897#comment-14572897
 ] 

Hudson commented on YARN-3751:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572894#comment-14572894
 ] 

Hudson commented on YARN-3749:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572899#comment-14572899
 ] 

Hudson commented on YARN-3585:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572895#comment-14572895
 ] 

Hudson commented on YARN-1462:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572889#comment-14572889
 ] 

Hudson commented on YARN-3762:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk-Java8 #207 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/207/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-41:
---
Release Note: The behavior of shutdown a NM could be different (if NM work 
preserving is not enabled): NM will unregister to RM immediately rather than 
waiting for timeout to be LOST. A new status of NodeStatus - SHUTDOWN is 
involved which could affect UI, CLI and ClusterMetrics for node's status. 

> The RM should handle the graceful shutdown of the NM.
> -
>
> Key: YARN-41
> URL: https://issues.apache.org/jira/browse/YARN-41
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Ravi Teja Ch N V
>Assignee: Devaraj K
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
> MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
> YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
> YARN-41-8.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, 
> which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572869#comment-14572869
 ] 

Hudson commented on YARN-3751:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java
* hadoop-yarn-project/CHANGES.txt


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572867#comment-14572867
 ] 

Hudson commented on YARN-1462:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* hadoop-yarn-project/CHANGES.txt


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572866#comment-14572866
 ] 

Hudson commented on YARN-3749:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572871#comment-14572871
 ] 

Hudson commented on YARN-3585:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572861#comment-14572861
 ] 

Hudson commented on YARN-3762:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2146 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2146/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler

2015-06-04 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572657#comment-14572657
 ] 

Naganarasimha G R commented on YARN-3758:
-

Hi [~rohithsharma] This issue is similar to the issue raised in YARN-3525, I 
feel if {{yarn.scheduler.minimum-allocation-mb}} is specific to capacity then 
better change it to {{yarn.scheduler.capacity.minimum-allocation-mb}} similar 
to the suggestion in YARN-3525. So that there is less confusion. Thoughts ?

> The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
> working as expected in FairScheduler
> 
>
> Key: YARN-3758
> URL: https://issues.apache.org/jira/browse/YARN-3758
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: skrho
>
> Hello there~~
> I have 2 clusters
> First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
> Physical memory each node
> Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
> Physical memory each node
> Wherever a mapreduce job is running, I want resourcemanager is to set the 
> minimum memory  256m to container
> So I was changing configuration in yarn-site.xml & mapred-site.xml
> yarn.scheduler.minimum-allocation-mb : 256
> mapreduce.map.java.opts : -Xms256m 
> mapreduce.reduce.java.opts : -Xms256m 
> mapreduce.map.memory.mb : 256 
> mapreduce.reduce.memory.mb : 256 
> In First cluster  whenever a mapreduce job is running , I can see used memory 
> 256m in web console( http://installedIP:8088/cluster/nodes )
> But In Second cluster whenever a mapreduce job is running , I can see used 
> memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
> I know default memory value is 1024m, so if there is not changing memory 
> setting, the default value is working.
> I have been testing for two weeks, but I don't know why mimimum memory 
> setting is not working in second cluster
> Why this difference is happened? 
> Am I wrong setting configuration?
> or Is there bug?
> Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working as expected in FairScheduler

2015-06-04 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-3758:
-
Summary: The mininum memory setting(yarn.scheduler.minimum-allocation-mb) 
is not working as expected in FairScheduler  (was: The mininum memory 
setting(yarn.scheduler.minimum-allocation-mb) is not working in container)

> The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
> working as expected in FairScheduler
> 
>
> Key: YARN-3758
> URL: https://issues.apache.org/jira/browse/YARN-3758
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: skrho
>
> Hello there~~
> I have 2 clusters
> First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
> Physical memory each node
> Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
> Physical memory each node
> Wherever a mapreduce job is running, I want resourcemanager is to set the 
> minimum memory  256m to container
> So I was changing configuration in yarn-site.xml & mapred-site.xml
> yarn.scheduler.minimum-allocation-mb : 256
> mapreduce.map.java.opts : -Xms256m 
> mapreduce.reduce.java.opts : -Xms256m 
> mapreduce.map.memory.mb : 256 
> mapreduce.reduce.memory.mb : 256 
> In First cluster  whenever a mapreduce job is running , I can see used memory 
> 256m in web console( http://installedIP:8088/cluster/nodes )
> But In Second cluster whenever a mapreduce job is running , I can see used 
> memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
> I know default memory value is 1024m, so if there is not changing memory 
> setting, the default value is working.
> I have been testing for two weeks, but I don't know why mimimum memory 
> setting is not working in second cluster
> Why this difference is happened? 
> Am I wrong setting configuration?
> or Is there bug?
> Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572630#comment-14572630
 ] 

Rohith commented on YARN-3758:
--

bq. Is it bug ?
To be clear, is the inconsistent behavior is bug? or implemented intentionally 
for FS?

> The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
> working in container
> 
>
> Key: YARN-3758
> URL: https://issues.apache.org/jira/browse/YARN-3758
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: skrho
>
> Hello there~~
> I have 2 clusters
> First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
> Physical memory each node
> Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
> Physical memory each node
> Wherever a mapreduce job is running, I want resourcemanager is to set the 
> minimum memory  256m to container
> So I was changing configuration in yarn-site.xml & mapred-site.xml
> yarn.scheduler.minimum-allocation-mb : 256
> mapreduce.map.java.opts : -Xms256m 
> mapreduce.reduce.java.opts : -Xms256m 
> mapreduce.map.memory.mb : 256 
> mapreduce.reduce.memory.mb : 256 
> In First cluster  whenever a mapreduce job is running , I can see used memory 
> 256m in web console( http://installedIP:8088/cluster/nodes )
> But In Second cluster whenever a mapreduce job is running , I can see used 
> memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
> I know default memory value is 1024m, so if there is not changing memory 
> setting, the default value is working.
> I have been testing for two weeks, but I don't know why mimimum memory 
> setting is not working in second cluster
> Why this difference is happened? 
> Am I wrong setting configuration?
> or Is there bug?
> Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3758) The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not working in container

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572628#comment-14572628
 ] 

Rohith commented on YARN-3758:
--

Had looked into code for CS and FS. The minimum allocation understanding and 
its behavior is different acros CS and FS.
# CS : It is straight forward that if any request with less than 
min-allocation-mb, then the CS normalize the request to min-allocation-mb. And 
containers are allocated with minimum-allocation-mb. 
# FS : if any request with less than min-allocation-mb then the FS normalize 
the request with the factor {{yarn.scheduler.increment-allocation-mb}}. Example 
in description, min-alocation-mb is 256mb, but increment-allocation-mb default 
1024mb which always allocate 1024mb to containers. There is huge effect of 
{{yarn.scheduler.increment-allocation-mb}} which changes the requested memory 
and assign with newly calculated resource.

The behavior is not consistent with CS and FS. I am not sure why there an 
additional configuration introduced in FS? Is it bug ?

> The mininum memory setting(yarn.scheduler.minimum-allocation-mb) is not 
> working in container
> 
>
> Key: YARN-3758
> URL: https://issues.apache.org/jira/browse/YARN-3758
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: skrho
>
> Hello there~~
> I have 2 clusters
> First cluster is 5 node , default 1 application queue, Capacity scheduler, 8G 
> Physical memory each node
> Second cluster is 10 node, 2 application queuey, fair-scheduler, 230G 
> Physical memory each node
> Wherever a mapreduce job is running, I want resourcemanager is to set the 
> minimum memory  256m to container
> So I was changing configuration in yarn-site.xml & mapred-site.xml
> yarn.scheduler.minimum-allocation-mb : 256
> mapreduce.map.java.opts : -Xms256m 
> mapreduce.reduce.java.opts : -Xms256m 
> mapreduce.map.memory.mb : 256 
> mapreduce.reduce.memory.mb : 256 
> In First cluster  whenever a mapreduce job is running , I can see used memory 
> 256m in web console( http://installedIP:8088/cluster/nodes )
> But In Second cluster whenever a mapreduce job is running , I can see used 
> memory 1024m in web console( http://installedIP:8088/cluster/nodes ) 
> I know default memory value is 1024m, so if there is not changing memory 
> setting, the default value is working.
> I have been testing for two weeks, but I don't know why mimimum memory 
> setting is not working in second cluster
> Why this difference is happened? 
> Am I wrong setting configuration?
> or Is there bug?
> Thank you for reading~~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572618#comment-14572618
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572611#comment-14572611
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572622#comment-14572622
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572620#comment-14572620
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572617#comment-14572617
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #948 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/948/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572609#comment-14572609
 ] 

Hudson commented on YARN-41:


FAILURE: Integrated in Hadoop-trunk-Commit #7963 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7963/])
YARN-41. The RM should handle the graceful shutdown of the NM. Contributed by 
Devaraj K. (junping_du: rev d7e7f6aa03c67b6a6ccf664adcb06d90bc963e58)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/client/ResourceTrackerPBClientImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClusterMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerResponsePBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNodeStatusUpdaterForLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerRequest.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/ResourceTracker.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/NodeState.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/ResourceTracker.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/proto/yarn_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/UnRegisterNodeManagerRequestPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/test/java/org/apache/hadoop/yarn/TestYarnServerApiClasses.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/MockNodeStatusUpdater.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/UnRegisterNodeManagerResponse.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/LocalRMInterface.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/impl/pb/service/ResourceTrackerPBServiceImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoo

[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572604#comment-14572604
 ] 

Hadoop QA commented on YARN-2674:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 42s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 4 new or modified test files. |
| {color:red}-1{color} | javac |   7m 32s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 39s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   1m 30s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   1m  3s | Tests failed in 
hadoop-yarn-applications-distributedshell. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | |  40m 27s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-applications-distributedshell |
| Failed unit tests | 
hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels |
|   | hadoop.yarn.applications.distributedshell.TestDSAppMaster |
|   | hadoop.yarn.applications.distributedshell.TestDistributedShell |
|   | hadoop.yarn.applications.distributedshell.TestDistributedShellWithRMHA |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12737533/YARN-2674.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / e830207 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/diffJavacWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-applications-distributedshell.html
 |
| hadoop-yarn-applications-distributedshell test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-applications-distributedshell.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8193/console |


This message was automatically generated.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1462) AHS API and other AHS changes to handle tags for completed MR jobs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572597#comment-14572597
 ] 

Hudson commented on YARN-1462:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
Revert "YARN-1462. Correct fix version from branch-2.7.1 to branch-2.8 in" 
(zjshen: rev 4eec2fd132a7c3d100f2124b99ca8cd7befa27c7)
* hadoop-yarn-project/CHANGES.txt
Revert "YARN-1462. Made RM write application tags to timeline server and 
exposed them to users via generic history web UI and REST API. Contributed by 
Xuan Gong." (zjshen: rev bc85959eddcb11037e8b9f0e06780b7c3e1cbab6)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestAHSClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/SystemMetricsPublisher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/MockAsm.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/TestSystemMetricsPublisher.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestYARNRunner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/cli/TestYarnCLI.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/NotRunningJob.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestYarnClient.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/metrics/ApplicationCreatedEvent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/metrics/ApplicationMetricsConstants.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ApplicationReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/TestApplicatonReport.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryManagerOnTimelineStore.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/ApplicationHistoryManagerImpl.java


> AHS API and other AHS changes to handle tags for completed MR jobs
> --
>
> Key: YARN-1462
> URL: https://issues.apache.org/jira/browse/YARN-1462
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Karthik Kambatla
>Assignee: Xuan Gong
> Fix For: 2.8.0
>
> Attachments: YARN-1462-branch-2.7-1.2.patch, 
> YARN-1462-branch-2.7-1.patch, YARN-1462.1.patch, YARN-1462.2.patch, 
> YARN-1462.3.patch, YARN-1462.4.patch
>
>
> AHS related work for tags. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572599#comment-14572599
 ] 

Hudson commented on YARN-3751:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3751. Fixed AppInfo to check if used resources are null. Contributed by 
Sunil G. (zjshen: rev dbc4f64937ea2b4c941a3ac49afc4eeba3f5b763)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/dao/AppInfo.java


> TestAHSWebServices fails after YARN-3467
> 
>
> Key: YARN-3751
> URL: https://issues.apache.org/jira/browse/YARN-3751
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Sunil G
> Fix For: 2.8.0
>
> Attachments: 0001-YARN-3751.patch
>
>
> YARN-3467 changed AppInfo and assumed that used resource is not null. It's 
> not true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3585) NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572601#comment-14572601
 ] 

Hudson commented on YARN-3585:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3585. NodeManager cannot exit on SHUTDOWN event triggered and NM recovery 
is enabled. Contributed by Rohith Sharmaks (jlowe: rev 
e13b671aa510f553f4a6a232b4694b6a4cce88ae)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/NodeManager.java
* hadoop-yarn-project/CHANGES.txt


> NodeManager cannot exit on SHUTDOWN event triggered and NM recovery is enabled
> --
>
> Key: YARN-3585
> URL: https://issues.apache.org/jira/browse/YARN-3585
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Peng Zhang
>Assignee: Rohith
>Priority: Critical
> Fix For: 2.7.1
>
> Attachments: 0001-YARN-3585.patch, YARN-3585.patch
>
>
> With NM recovery enabled, after decommission, nodemanager log show stop but 
> process cannot end. 
> non daemon thread:
> {noformat}
> "DestroyJavaVM" prio=10 tid=0x7f3460011800 nid=0x29ec waiting on 
> condition [0x]
> "leveldb" prio=10 tid=0x7f3354001800 nid=0x2a97 runnable 
> [0x]
> "VM Thread" prio=10 tid=0x7f3460167000 nid=0x29f8 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x7f346002 
> nid=0x29ed runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x7f3460022000 
> nid=0x29ee runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x7f3460024000 
> nid=0x29ef runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x7f3460025800 
> nid=0x29f0 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x7f3460027800 
> nid=0x29f1 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x7f3460029000 
> nid=0x29f2 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x7f346002b000 
> nid=0x29f3 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x7f346002d000 
> nid=0x29f4 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=10 tid=0x7f3460120800 nid=0x29f7 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x7f346011c800 
> nid=0x29f5 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x7f346011e800 
> nid=0x29f6 runnable 
> "VM Periodic Task Thread" prio=10 tid=0x7f346019f800 nid=0x2a01 waiting 
> on condition 
> {noformat}
> and jni leveldb thread stack
> {noformat}
> Thread 12 (Thread 0x7f33dd842700 (LWP 10903)):
> #0  0x003d8340b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x7f33dfce2a3b in leveldb::(anonymous 
> namespace)::PosixEnv::BGThreadWrapper(void*) () from 
> /tmp/libleveldbjni-64-1-6922178968300745716.8
> #2  0x003d83407851 in start_thread () from /lib64/libpthread.so.0
> #3  0x003d830e811d in clone () from /lib64/libc.so.6
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3762) FairScheduler: CME on FSParentQueue#getQueueUserAclInfo

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572591#comment-14572591
 ] 

Hudson commented on YARN-3762:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3762. FairScheduler: CME on FSParentQueue#getQueueUserAclInfo. (kasha) 
(kasha: rev edb9cd0f7aa1ecaf34afaa120e3d79583e0ec689)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/QueueManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSParentQueue.java


> FairScheduler: CME on FSParentQueue#getQueueUserAclInfo
> ---
>
> Key: YARN-3762
> URL: https://issues.apache.org/jira/browse/YARN-3762
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: yarn-3762-1.patch, yarn-3762-1.patch, yarn-3762-2.patch
>
>
> In our testing, we ran into the following ConcurrentModificationException:
> {noformat}
> halxg.cloudera.com:8042, nodeRackName/rackvb07, nodeNumContainers0
> 15/05/22 13:02:22 INFO distributedshell.Client: Queue info, 
> queueName=root.testyarnpool3, queueCurrentCapacity=0.0, 
> queueMaxCapacity=-1.0, queueApplicationCount=0, queueChildQueueCount=0
> 15/05/22 13:02:22 FATAL distributedshell.Client: Error running Client
> java.util.ConcurrentModificationException: 
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
>   at java.util.ArrayList$Itr.next(ArrayList.java:851)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.getQueueUserAclInfo(FSParentQueue.java:155)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getQueueUserAclInfo(FairScheduler.java:1395)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueUserAcls(ClientRMService.java:880)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3749) We should make a copy of configuration when init MiniYARNCluster with multiple RMs

2015-06-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572596#comment-14572596
 ] 

Hudson commented on YARN-3749:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #218 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/218/])
YARN-3749. We should make a copy of configuration when init (xgong: rev 
5766a04428f65bb008b5c451f6f09e61e1000300)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/TestMiniYarnCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMEmbeddedElector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/HATestUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestApplicationMasterServiceProtocolOnHA.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/ProtocolHATestBase.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMFailover.java


> We should make a copy of configuration when init MiniYARNCluster with 
> multiple RMs
> --
>
> Key: YARN-3749
> URL: https://issues.apache.org/jira/browse/YARN-3749
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Chun Chen
>Assignee: Chun Chen
> Fix For: 2.8.0
>
> Attachments: YARN-3749.2.patch, YARN-3749.3.patch, YARN-3749.4.patch, 
> YARN-3749.5.patch, YARN-3749.6.patch, YARN-3749.7.patch, YARN-3749.7.patch, 
> YARN-3749.patch
>
>
> When I was trying to write a test case for YARN-2674, I found DS client 
> trying to connect to both rm1 and rm2 with the same address 0.0.0.0:18032 
> when RM failover. But I initially set 
> yarn.resourcemanager.address.rm1=0.0.0.0:18032, 
> yarn.resourcemanager.address.rm2=0.0.0.0:28032  After digging, I found it is 
> in ClientRMService where the value of yarn.resourcemanager.address.rm2 
> changed to 0.0.0.0:18032. See the following code in ClientRMService:
> {code}
> clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
>YarnConfiguration.RM_ADDRESS,
>
> YarnConfiguration.DEFAULT_RM_ADDRESS,
>server.getListenerAddress());
> {code}
> Since we use the same instance of configuration in rm1 and rm2 and init both 
> RM before we start both RM, we will change yarn.resourcemanager.ha.id to rm2 
> during init of rm2 and yarn.resourcemanager.ha.id will become rm2 during 
> starting of rm1.
> So I think it is safe to make a copy of configuration when init both of the 
> rm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572584#comment-14572584
 ] 

Junping Du commented on YARN-41:


bq. These findbugs are not related to the patch here.
Agree. Also, the test failure is not related and the same failure also show up 
in other patches, like: YARN-3248. We may should file a separated JIRA to fix 
this. Committing latest patch in.

> The RM should handle the graceful shutdown of the NM.
> -
>
> Key: YARN-41
> URL: https://issues.apache.org/jira/browse/YARN-41
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Ravi Teja Ch N V
>Assignee: Devaraj K
> Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
> MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
> YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
> YARN-41-8.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, 
> which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Chun Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572535#comment-14572535
 ] 

Chun Chen commented on YARN-2674:
-

Upload YARN-2674.3.patch with a test case and more detailed comments.

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2674) Distributed shell AM may re-launch containers if RM work preserving restart happens

2015-06-04 Thread Chun Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated YARN-2674:

Attachment: YARN-2674.3.patch

> Distributed shell AM may re-launch containers if RM work preserving restart 
> happens
> ---
>
> Key: YARN-2674
> URL: https://issues.apache.org/jira/browse/YARN-2674
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Chun Chen
>Assignee: Chun Chen
> Attachments: YARN-2674.1.patch, YARN-2674.2.patch, YARN-2674.3.patch
>
>
> Currently, if RM work preserving restart happens while distributed shell is 
> running, distribute shell AM may re-launch all the containers, including 
> new/running/complete. We must make sure it won't re-launch the 
> running/complete containers.
> We need to remove allocated containers from 
> AMRMClientImpl#remoteRequestsTable once AM receive them from RM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2392) add more diags about app retry limits on AM failures

2015-06-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572419#comment-14572419
 ] 

Steve Loughran commented on YARN-2392:
--

checkstyle
{code}
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java:1464:
 '" Then click on links to logs of each attempt.\n"' have incorrect indentation 
level 8, expected level should be 10.
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java:1020:
 Line is longer than 80 characters (found 81).
{code}

> add more diags about app retry limits on AM failures
> 
>
> Key: YARN-2392
> URL: https://issues.apache.org/jira/browse/YARN-2392
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: YARN-2392-001.patch, YARN-2392-002.patch, 
> YARN-2392-002.patch
>
>
> # when an app fails the failure count is shown, but not what the global + 
> local limits are. If the two are different, they should both be printed. 
> # the YARN-2242 strings don't have enough whitespace between text and the URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572376#comment-14572376
 ] 

Devaraj K commented on YARN-41:
---

{code:xml}
-1  pre-patch   19m 45s Pre-patch trunk has 3 extant Findbugs (version 
3.0.0) warnings.
{code}

These findbugs are not related to the patch here.

> The RM should handle the graceful shutdown of the NM.
> -
>
> Key: YARN-41
> URL: https://issues.apache.org/jira/browse/YARN-41
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Ravi Teja Ch N V
>Assignee: Devaraj K
> Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
> MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
> YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
> YARN-41-8.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, 
> which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-41) The RM should handle the graceful shutdown of the NM.

2015-06-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572318#comment-14572318
 ] 

Hadoop QA commented on YARN-41:
---

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 58s | Pre-patch trunk has 3 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  1s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m 35s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   6m  3s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 30s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   0m 24s | Tests passed in 
hadoop-yarn-server-common. |
| {color:green}+1{color} | yarn tests |   6m 16s | Tests passed in 
hadoop-yarn-server-nodemanager. |
| {color:red}-1{color} | yarn tests |  50m 29s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| {color:green}+1{color} | yarn tests |   1m 52s | Tests passed in 
hadoop-yarn-server-tests. |
| | | 110m 24s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.security.TestRMDelegationTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12735565/YARN-41-8.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 1bb79c9 |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-nodemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-tests test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/artifact/patchprocess/testrun_hadoop-yarn-server-tests.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8192/console |


This message was automatically generated.

> The RM should handle the graceful shutdown of the NM.
> -
>
> Key: YARN-41
> URL: https://issues.apache.org/jira/browse/YARN-41
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager, resourcemanager
>Reporter: Ravi Teja Ch N V
>Assignee: Devaraj K
> Attachments: MAPREDUCE-3494.1.patch, MAPREDUCE-3494.2.patch, 
> MAPREDUCE-3494.patch, YARN-41-1.patch, YARN-41-2.patch, YARN-41-3.patch, 
> YARN-41-4.patch, YARN-41-5.patch, YARN-41-6.patch, YARN-41-7.patch, 
> YARN-41-8.patch, YARN-41.patch
>
>
> Instead of waiting for the NM expiry, RM should remove and handle the NM, 
> which is shutdown gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3017) ContainerID in ResourceManager Log Has Slightly Different Format From AppAttemptID

2015-06-04 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572289#comment-14572289
 ] 

Rohith commented on YARN-3017:
--

Apoligies for coming very late into this issue.. Thinking that changing 
containerId format may breaks complatability when rolling upgrade has been done 
with RM HA + work preserving enabled? IIUC, using ZKRMStateStore, rolling 
upgrade can be done now.

> ContainerID in ResourceManager Log Has Slightly Different Format From 
> AppAttemptID
> --
>
> Key: YARN-3017
> URL: https://issues.apache.org/jira/browse/YARN-3017
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: MUFEED USMAN
>Priority: Minor
>  Labels: PatchAvailable
> Attachments: YARN-3017.patch, YARN-3017_1.patch, YARN-3017_2.patch
>
>
> Not sure if this should be filed as a bug or not.
> In the ResourceManager log in the events surrounding the creation of a new
> application attempt,
> ...
> ...
> 2014-11-14 17:45:37,258 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching
> masterappattempt_1412150883650_0001_02
> ...
> ...
> The application attempt has the ID format "_1412150883650_0001_02".
> Whereas the associated ContainerID goes by "_1412150883650_0001_02_".
> ...
> ...
> 2014-11-14 17:45:37,260 INFO
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting 
> up
> container Container: [ContainerId: container_1412150883650_0001_02_01,
> NodeId: n67:55933, NodeHttpAddress: n67:8042, Resource:  vCores:1,
> disks:0.0>, Priority: 0, Token: Token { kind: ContainerToken, service:
> 10.10.70.67:55933 }, ] for AM appattempt_1412150883650_0001_02
> ...
> ...
> Curious to know if this is kept like that for a reason. If not while using
> filtering tools to, say, grep events surrounding a specific attempt by the
> numeric ID part information may slip out during troubleshooting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)