[jira] [Updated] (YARN-4499) Bad config values of "scheduler.maximum-allocation-vcores"

2015-12-22 Thread Tianyin Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyin Xu updated YARN-4499:
-
Description: 
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is {{8}}) 
---if we have {{8}} cores for scheduling, there's few reason we only allow the 
maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

At least, we should fix the doc. The error is pretty bad. A simple search on 
the Internet shows some ppl are confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(but seriously, I think we should have an automatic default which is equal to 
the number of cores on the machine...)




  was:
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is {{8}}) 
---if we have {{8}} cores for scheduling, there's few reason we only allow the 
maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(but seriously, I think we should have an automatic default which is equal to 
the number of cores on the machine...)





> Bad config values of "scheduler.maximum-allocation-vcores"
> --
>
> Key: YARN-4499
> URL: https://issues.apache.org/jira/browse/YARN-4499
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Tianyin Xu
>
> Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} 
> is {{4}}, according to {{YarnConfiguration.java}}
> {code}
>   public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
>   YARN_PREFIX + "scheduler.maximum-allocation-vcores";
>   public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
> {code}
> However, according to 
> [yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
>  this value should be {{32}}.
> Yes, this seems to be a doc error, but I feel that the default value should 
> be the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is 
> {{8}}) ---if we have {{8}} cores for scheduling, there's few reason we only 
> allow the maximum of {{4}}...
> The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
> |http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
>  also suggests that "the maximum value (of 
> {{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
> {{yarn.nodemanager.resource.cpu-vcores}}..."
> At least, we should fix the doc. The error is pretty bad. A simple search on 
> the Internet shows some ppl are confused by this error, for example,
> https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-sch

[jira] [Updated] (YARN-4499) Bad config values of "scheduler.maximum-allocation-vcores"

2015-12-22 Thread Tianyin Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyin Xu updated YARN-4499:
-
Description: 
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is {{8}}) 
---if we have {{8}} cores for scheduling, there's few reason we only allow the 
maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(but seriously, I think we should have an automatic default which is equal to 
the number of cores on the machine...)




  was:
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is {{8}}) 
---if we have {{8}} cores for scheduling, there's few reason we only allow the 
maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(seriously, I think we should have an automatic default which is equal to the 
number of cores on the machine...)





> Bad config values of "scheduler.maximum-allocation-vcores"
> --
>
> Key: YARN-4499
> URL: https://issues.apache.org/jira/browse/YARN-4499
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Tianyin Xu
>
> Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} 
> is {{4}}, according to {{YarnConfiguration.java}}
> {code}
>   public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
>   YARN_PREFIX + "scheduler.maximum-allocation-vcores";
>   public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
> {code}
> However, according to 
> [yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
>  this value should be {{32}}.
> Yes, this seems to be a doc error, but I feel that the default value should 
> be the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is 
> {{8}}) ---if we have {{8}} cores for scheduling, there's few reason we only 
> allow the maximum of {{4}}...
> The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
> |http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
>  also suggests that "the maximum value (of 
> {{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
> {{yarn.nodemanager.resource.cpu-vcores}}..."
> The doc error is pretty bad. A simple search on the Internet shows some ppl 
> are confused by this error, for example,
> https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098
> (but seriously, I think we should 

[jira] [Updated] (YARN-4499) Bad config values of "scheduler.maximum-allocation-vcores"

2015-12-22 Thread Tianyin Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyin Xu updated YARN-4499:
-
Description: 
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is {{8}}) 
---if we have {{8}} cores for scheduling, there's few reason we only allow the 
maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(seriously, I think we should have an automatic default which is equal to the 
number of cores on the machine...)




  was:
Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} ---if we have {{8}} cores 
for scheduling, there's few reason we only allow the maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(seriously, I think we should have an automatic default which is equal to the 
number of cores on the machine...)





> Bad config values of "scheduler.maximum-allocation-vcores"
> --
>
> Key: YARN-4499
> URL: https://issues.apache.org/jira/browse/YARN-4499
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1, 2.6.2
>Reporter: Tianyin Xu
>
> Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} 
> is {{4}}, according to {{YarnConfiguration.java}}
> {code}
>   public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
>   YARN_PREFIX + "scheduler.maximum-allocation-vcores";
>   public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
> {code}
> However, according to 
> [yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
>  this value should be {{32}}.
> Yes, this seems to be a doc error, but I feel that the default value should 
> be the same as {{yarn.nodemanager.resource.cpu-vcores}} (whose default is 
> {{8}}) ---if we have {{8}} cores for scheduling, there's few reason we only 
> allow the maximum of {{4}}...
> The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
> |http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
>  also suggests that "the maximum value (of 
> {{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
> {{yarn.nodemanager.resource.cpu-vcores}}..."
> The doc error is pretty bad. A simple search on the Internet shows some ppl 
> are confused by this error, for example,
> https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098
> (seriously, I think we should have an automatic default which is

[jira] [Created] (YARN-4499) Bad config values of "scheduler.maximum-allocation-vcores"

2015-12-22 Thread Tianyin Xu (JIRA)
Tianyin Xu created YARN-4499:


 Summary: Bad config values of "scheduler.maximum-allocation-vcores"
 Key: YARN-4499
 URL: https://issues.apache.org/jira/browse/YARN-4499
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.6.2, 2.7.1
Reporter: Tianyin Xu


Currently, the default value of {{yarn.scheduler.maximum-allocation-vcores}} is 
{{4}}, according to {{YarnConfiguration.java}}

{code}
  public static final String RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES =
  YARN_PREFIX + "scheduler.maximum-allocation-vcores";
  public static final int DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES = 4;
{code}

However, according to 
[yarn-default.xml|https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml],
 this value should be {{32}}.

Yes, this seems to be a doc error, but I feel that the default value should be 
the same as {{yarn.nodemanager.resource.cpu-vcores}} ---if we have {{8}} cores 
for scheduling, there's few reason we only allow the maximum of {{4}}...

The Cloudera's article on [Tuning the Cluster for MapReduce v2 (YARN) 
|http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html]
 also suggests that "the maximum value (of 
{{yarn.nodemanager.resource.cpu-vcores}}) is usually equal to 
{{yarn.nodemanager.resource.cpu-vcores}}..."

The doc error is pretty bad. A simple search on the Internet shows some ppl are 
confused by this error, for example,
https://community.cloudera.com/t5/Cloudera-Manager-Installation/yarn-nodemanager-resource-cpu-vcores-and-yarn-scheduler-maximum/td-p/31098

(seriously, I think we should have an automatic default which is equal to the 
number of cores on the machine...)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2882) Add ExecutionType to denote if a container execution is GUARANTEED or OPPORTUNISTIC

2015-12-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069226#comment-15069226
 ] 

Arun Suresh commented on YARN-2882:
---

bq. However, I think we still need a flag in ResourceRequest to describe if 
it's a "opportunistic" container or not, correct? Otherwise RM/LocalRM cannot 
decide if it can take risk to allocate a queueable/oversubscribe container.
[~leftnoteasy], as [~ka...@cloudera.com] had mentioned, we had a pretty lengthy 
discussion about this.. finally we came to the consensus that we dissallow the 
AM from making that decision.
# Firstly, The AM might not be in a position to know if a Request can be 
handled in a guaranteed or opportunistic manner. Having a pluggable policy on 
the NM that can filter some Resource Requests (for eg. Distributed Scheduling 
can first target non-strict locality requests) which can evolve to really 
dynamic policies based on load etc.
# Secondly, this would ensure that delinquent AMs wont game the scheduler by 
asking for only guaranteed Resources. 
Hope this made sense

> Add ExecutionType to denote if a container execution is GUARANTEED or 
> OPPORTUNISTIC
> ---
>
> Key: YARN-2882
> URL: https://issues.apache.org/jira/browse/YARN-2882
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Konstantinos Karanasos
> Attachments: YARN-2882-yarn-2877.001.patch, 
> YARN-2882-yarn-2877.002.patch, YARN-2882-yarn-2877.003.patch, 
> YARN-2882-yarn-2877.004.patch, yarn-2882.patch
>
>
> This JIRA introduces the notion of container types.
> We propose two initial types of containers: guaranteed-start and queueable 
> containers.
> Guaranteed-start are the existing containers, which are allocated by the 
> central RM and are instantaneously started, once allocated.
> Queueable is a new type of container, which allows containers to be queued in 
> the NM, thus their execution may be arbitrarily delayed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3870) Providing raw container request information for fine scheduling

2015-12-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069208#comment-15069208
 ] 

Arun Suresh commented on YARN-3870:
---

Thank you for starting this discussion [~grey]

Correct me if I am wrong, what you are proposing, I guess is some way for the 
Scheduler to co-relate the expanded Resource Requests. I do feel this would be 
genuinely useful, not only from a Scheduling perspective for eg. making 
affinity / anti-afinity scheduling decisions viz. YARN-1042. This will also 
greatly help improving pre-emption decisions in the FairScheduler viz. 
YARN-2154.. 

This would also be extremely useful for AMs too. Currently the MRAM does the 
book keeping and matches an allocated container to ResourceRequest. AMs can be 
generally relieved of this job if an allocated Container Token can easily be 
matched against a Resource Request.

One possible approach could be to have the AMClient generate a unique id for a 
Resource request and tag each of the expanded requests (Node, Rack and ANY) 
with this id. This Id can then be passed around in the 
Container/ContainerTokenIdentifier.

[~ka...@cloudera.com], [~vinodkv], [~leftnoteasy], Thoughts ?

> Providing raw container request information for fine scheduling
> ---
>
> Key: YARN-3870
> URL: https://issues.apache.org/jira/browse/YARN-3870
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, applications, capacityscheduler, fairscheduler, 
> resourcemanager, scheduler, yarn
>Reporter: Lei Guo
>
> Currently, when AM sends container requests to RM and scheduler, it expands 
> individual container requests into host/rack/any format. For instance, if I 
> am asking for container request with preference "host1, host2, host3", 
> assuming all are in the same rack rack1, instead of sending one raw container 
> request to RM/Scheduler with raw preference list, it basically expand it to 
> become 5 different objects with host1, host2, host3, rack1 and any in there. 
> When scheduler receives information, it basically already lost the raw 
> request. This is ok for single container request, but it will cause trouble 
> when dealing with multiple container requests from the same application. 
> Consider this case:
> 6 hosts, two racks:
> rack1 (host1, host2, host3) rack2 (host4, host5, host6)
> When application requests two containers with different data locality 
> preference:
> c1: host1, host2, host4
> c2: host2, host3, host5
> This will end up with following container request list when client sending 
> request to RM/Scheduler:
> host1: 1 instance
> host2: 2 instances
> host3: 1 instance
> host4: 1 instance
> host5: 1 instance
> rack1: 2 instances
> rack2: 2 instances
> any: 2 instances
> Fundamentally, it is hard for scheduler to make a right judgement without 
> knowing the raw container request. The situation will get worse when dealing 
> with affinity and anti-affinity or even gang scheduling etc.
> We need some way to provide raw container request information for fine 
> scheduling purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4438) Implement RM leader election with curator

2015-12-22 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069202#comment-15069202
 ] 

Karthik Kambatla commented on YARN-4438:


bq. And because ZKRMStateStore is currently in active service, it cannot be 
simply moved to AlwaysOn service. So, I'd like to do it separately to minimize 
the core change in this jira.
Fine with separate JIRA. Not sure I understand why ZKRMStateStore needs to be 
an AlwaysOn service. 

bq. I'd like to change this part for RM to not refresh the configs if shared 
storage based config provider is not enabled.
I was never a fan of the shared-storage-configuration stuff. Now that we have 
it, don't think we can get rid of it until Hadoop 4. How would this change 
look? The RM has an instance of the elector; every time we transition to 
active, will either the RM or the elector check if 
shared-storage-config-provider is enabled and call refresh? 

But yeah, I do see the point of calling these methods directly from RM. 

bq. To avoid a busy loop and rejoining immediately. 
If we rejoin immediately, one of the RMs would become Active. It is not like 
the RM is going to use the cycles for anything else if we sleep. Is the concern 
that Curator may be biased in picking an RM in certain conditions? 

bq. What do you mean by force give-up ? exit RM ?
If leaderLatch.close() throws an exception, when does Curator realize the RM is 
not participating in the election anymore? If not, it might keep electing the 
same RM active? How do we handle this, and how long of a wait is okay? 

bq. Even though RM remains at standby, all services should be already shutdown, 
so there's no harm to the end users ?
Agree, there is no harm. My concern is about availability - having one of the 
RMs active "most" of the time. 

bq. I have one question about ActiveStandbyCheckThread. if we make zkStateStore 
and elector to share the same zkClient, do we still need the 
ActiveStandbyCheckThread ? the elector itself should get notification when the 
connection is lost.

Are you referring to the VerifyActiveStatusThread? The RM loses leadership; the 
connection can be restored even if it loses. We could actively go stop the 
store if it hasn't already stopped. The store would have already gotten fenced, 
so we don't run the risk of corrupting the store. So, you are right, we might 
not need that thread.

bq. This is currently what EmbeddedElectorService is doing. If the leadership 
is already lost from zk's perspective, the other RM should take up the 
leadership
You are right, it isn't a big deal. Just realized EmbeddedElectorService does 
the same today. Haven't seen Curator's LeaderLatch code. What happens if this 
RM is subsequently elected leader? Does the transition to Active succeed just 
fine? Or, is it possible it gets stuck in a way it can't transition to active? 
If it gets into such a situation, we should consider crashing it altogether. 

bq. I think leaderLatch could never be null ?
Seeing all the NPEs we have in RM/Scheduler, I would like for us to err on the 
side of caution and do null-checks. If not, we at least need to make it 
consistent everywhere. 

bq. Why does it need to be called outside of if (state == 
HAServiceProtocol.HAServiceState.ACTIVE) ? This is a fresh start, it does not 
need to call reinitiialize.
You are right. Sorry for the noise, clearly it has been a while since I looked 
at this code. 

> Implement RM leader election with curator
> -
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Jian He
>Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069190#comment-15069190
 ] 

Varun Saxena commented on YARN-4224:


bq. It's about locating resources in the system. /flows will be a query 
endpoint and we can send query parameters there, but /flows/{uid} will locate 
one single flow. What I'm confusing right now is, why do we need to have both 
plural and singular forms /run/{uid} and /runs/{uid}? Will they locate to the 
same run, given the same UID?
Ok. The way UID endpoints have been set up in the patch are {{\{resource to 
query\}/\{uid required to query that resource\}}}. So run means single run to 
query and runs means multiple runs to query. How do we differentiate between 
different endpoints then ?
UID is not required to locate multiple flows, flowruns, apps and entities ?

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2015-12-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-3215:
---

Assignee: Naganarasimha G R  (was: Wangda Tan)

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Naganarasimha G R
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069175#comment-15069175
 ] 

Naganarasimha G R commented on YARN-3215:
-

Hi [~wangda], As i was discussing with you offline wrt YARN-4225, this jira 
would be important in a multitenant scenario. As tenant needs to know how much 
head room is available to them. So i want to assign this issue to myself
Current Behavior, It gives the headroom of the default partition of a queue, 
for apps. And if the default partition has no nodes configured then minimum 
allocation cores and mb will be given as headroom. And also more erroneous 
situation is when the Default partition size is very large than a particular 
other partition, then  Head room sent to app is larger than what it can use it 
might lead to hanging.
I initially propose to send head room for all partitions accessible to a queue 
to the app. Will try to work on POC patch and share at the earliest

> Respect labels in CapacityScheduler when computing headroom
> ---
>
> Key: YARN-3215
> URL: https://issues.apache.org/jira/browse/YARN-3215
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>
> In existing CapacityScheduler, when computing headroom of an application, it 
> will only consider "non-labeled" nodes of this application.
> But it is possible the application is asking for labeled resources, so 
> headroom-by-label (like 5G resource available under node-label=red) is 
> required to get better resource allocation and avoid deadlocks such as 
> MAPREDUCE-5928.
> This JIRA could involve both API changes (such as adding a 
> label-to-available-resource map in AllocateResponse) and also internal 
> changes in CapacityScheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2015-12-22 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069174#comment-15069174
 ] 

Vrushali C commented on YARN-4062:
--

I will fix the findbug warning and take a look at the whitespace lines. 

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069172#comment-15069172
 ] 

Varun Saxena commented on YARN-4224:


bq. So the UID in /entities/{entitytype}/{uid}/ is actually app UID? This make 
the whole endpoint looks really weird... I thought it's an entity UID to locate 
to one timeline entity. However, I think you raised a very useful use case to 
query a certain type of entity for one application. Maybe we'd like to change 
the format of this endpoint to address this case? I don't really feel like the 
current form of the endpoint...
When I had written the code,I was assuming delimiter wont be public so UID has 
to madatorily go from server. And we cant append entity type there.
As we have reached consensus on making delimiter public, UI can actually append 
the entity type in front of UID. But this will have to be a special case for 
entities endpoint.

bq. So to find one entity with cluster, user, flow, flowrun, appid and entity 
id, we do not have the hierarchical endpoint, but can only get an entity 
through the UID interface? Do we need the hierarchical interface for CLIs?
Not really. In case of hierarchical endpoint, flow context information can be 
supplied as part of optional query parameters. This would preclude the need of 
query flow context.
But for hierarchical queries we envisage direct queries too rather than only 
flows->flowruns->apps->entities sequence when query via UID. But now as 
delimiter can be public, UID endpoint can also have a direct query potentially. 
So can have similar structure as for hierarchical query. 

I think we can discuss all these points in detail in today's call.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069147#comment-15069147
 ] 

Hadoop QA commented on YARN-3480:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
38s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
28s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 29s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 49s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 155m 0s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779182/YARN-3480.13.patch |
| JIRA Issue | YARN-3480 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  ch

[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069142#comment-15069142
 ] 

Hadoop QA commented on YARN-4304:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 13s 
{color} | {color:red} Patch generated 20 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 (total was 252, now 264). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 4 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 5s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 23s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 137m 6s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779183/0007-YARN-4304.pat

[jira] [Updated] (YARN-4498) Application level node labels stats to be available in UI/REST/CLI

2015-12-22 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-4498:
---
Summary: Application level node labels stats to be available in UI/REST/CLI 
 (was: Application level running labels/stats to be available to UI/REST/CLI)

> Application level node labels stats to be available in UI/REST/CLI
> --
>
> Key: YARN-4498
> URL: https://issues.apache.org/jira/browse/YARN-4498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, client, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>
> Currently nodelabel stats per application is not available through 
> REST/CLI/UI like currently used labels by all live containers, total stats of 
> containers per label for app etc..
> Will split into multiple task if approach is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4498) Application level running labels/stats to be available to UI/REST/CLI

2015-12-22 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-4498:
--

 Summary: Application level running labels/stats to be available to 
UI/REST/CLI
 Key: YARN-4498
 URL: https://issues.apache.org/jira/browse/YARN-4498
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt


Currently nodelabel stats per application is not available through REST/CLI/UI 
like currently used labels by all live containers, total stats of containers 
per label for app etc..

Will split into multiple task if approach is fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4327) RM can not renew TIMELINE_DELEGATION_TOKEN in secure clusters

2015-12-22 Thread zhangshilong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069121#comment-15069121
 ] 

zhangshilong commented on YARN-4327:


yeah,I tried yarn.timeline-service.http-authentication.type=kerberos.  So jobs 
could be submitted, but users can not access application history from webapp.

> RM can not renew  TIMELINE_DELEGATION_TOKEN in secure clusters
> --
>
> Key: YARN-4327
> URL: https://issues.apache.org/jira/browse/YARN-4327
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager, timelineserver
>Affects Versions: 2.7.1
> Environment: hadoop 2.7.1hdfs,yarn, mrhistoryserver, ATS all use 
> kerberos security.
> conf like this:
> 
>   hadoop.security.authorization
>   true
>   Is service-level authorization enabled?
> 
> 
>   hadoop.security.authentication
>   kerberos
>   Possible values are simple (no authentication), and kerberos
>   
> 
>Reporter: zhangshilong
>
> bin hadoop 2.7.1
> ATS conf like this: 
> 
> yarn.timeline-service.http-authentication.type
> simple
> 
> 
> yarn.timeline-service.http-authentication.kerberos.principal
> HTTP/_h...@xxx.com
> 
> 
> yarn.timeline-service.http-authentication.kerberos.keytab
> /etc/hadoop/keytabs/xxx.keytab
> 
> 
> yarn.timeline-service.principal
> xxx/_h...@xxx.com
> 
> 
> yarn.timeline-service.keytab
> /etc/hadoop/keytabs/xxx.keytab
> 
> 
> yarn.timeline-service.best-effort
> true
> 
> 
> yarn.timeline-service.enabled
> true
>   
>  
> I'd like to allow everyone to access ATS from HTTP as RM,HDFS.
> client can submit job to RM and  add TIMELINE_DELEGATION_TOKEN  to AM 
> Context, but RM can not renew  TIMELINE_DELEGATION_TOKEN and make application 
> to failure.
> RM logs:
> 2015-11-03 11:58:38,191 WARN 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer:
>  Unable to add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, 
> Service: 10.12.38.4:8188, Ident: (owner=yarn-test, renewer=yarn-test, 
> realUser=, issueDate=1446523118046, maxDate=1447127918046, sequenceNumber=9, 
> masterKeyId=2)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:439)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:78)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:847)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:828)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: HTTP status [500], message [Null user]
> at 
> org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:169)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:287)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:212)
> at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:414)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:396)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$3.run(TimelineClientImpl.java:378)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$5.run(TimelineClientImpl.java:451)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:183)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:466)
> at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:400)
> at 
> org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTok

[jira] [Commented] (YARN-4109) Exception on RM scheduler page loading with labels

2015-12-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069113#comment-15069113
 ] 

Hudson commented on YARN-4109:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9017 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9017/])
YARN-4109. Exception on RM scheduler page loading with labels. (Mohammad 
(rohithsharmaks: rev 8c180a13c82ab9d60f595e6942e35d51024dab53)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* hadoop-yarn-project/CHANGES.txt


> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> 

[jira] [Created] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-22 Thread Jun Gong (JIRA)
Jun Gong created YARN-4497:
--

 Summary: RM might fail to restart when recovering apps whose 
attempts are missing
 Key: YARN-4497
 URL: https://issues.apache.org/jira/browse/YARN-4497
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jun Gong
Assignee: Jun Gong


Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing 
attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, 
RM successfully stored attempt1 and attempt3, but failed to store attempt2. 
When RM restarts, in *RMAppImpl#recover*, we recover attempts one by one, for 
this case, we will recover attmept1, then attempt2. When recovering attempt2, 
we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*,
 it will first find its ApplicationAttemptStateData, but it could not find it, 
an error will come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, 
line 880).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4497) RM might fail to restart when recovering apps whose attempts are missing

2015-12-22 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-4497:
---
Description: 
Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing 
attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, 
RM successfully stored attempt1 and attempt3, but failed to store attempt2. 
When RM restarts, in *RMAppImpl#recover*, we recover attempts one by one, for 
this case, we will recover attmept1, then attempt2. When recovering attempt2, 
we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will 
first find its ApplicationAttemptStateData, but it could not find it, an error 
will come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 
880).


  was:
Find following problem when discussing in YARN-3480.

If RM fails to store some attempts in RMStateStore, there will be missing 
attempts in RMStateStore, for the case storing attempt1, attempt2 and attempt3, 
RM successfully stored attempt1 and attempt3, but failed to store attempt2. 
When RM restarts, in *RMAppImpl#recover*, we recover attempts one by one, for 
this case, we will recover attmept1, then attempt2. When recovering attempt2, 
we call  *((RMAppAttemptImpl)this.currentAttempt).recover(state)*,
 it will first find its ApplicationAttemptStateData, but it could not find it, 
an error will come at *assert attemptState != null*(*RMAppAttemptImpl#recover*, 
line 880).



> RM might fail to restart when recovering apps whose attempts are missing
> 
>
> Key: YARN-4497
> URL: https://issues.apache.org/jira/browse/YARN-4497
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> Find following problem when discussing in YARN-3480.
> If RM fails to store some attempts in RMStateStore, there will be missing 
> attempts in RMStateStore, for the case storing attempt1, attempt2 and 
> attempt3, RM successfully stored attempt1 and attempt3, but failed to store 
> attempt2. When RM restarts, in *RMAppImpl#recover*, we recover attempts one 
> by one, for this case, we will recover attmept1, then attempt2. When 
> recovering attempt2, we call  
> *((RMAppAttemptImpl)this.currentAttempt).recover(state)*, it will first find 
> its ApplicationAttemptStateData, but it could not find it, an error will come 
> at *assert attemptState != null*(*RMAppAttemptImpl#recover*, line 880).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069093#comment-15069093
 ] 

Naganarasimha G R commented on YARN-4492:
-

Thanks [~templedf], for the review
Hope [~jlowe] / [~wangda], (reviewers of YARN-2056) can take a look at this ?


> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069081#comment-15069081
 ] 

Rohith Sharma K S commented on YARN-4352:
-

[~sunilg] can you check test failures, I checked logs, it seems to be failures 
are related to this patch. 

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4195) Support of node-labels in the ReservationSystem "Plan"

2015-12-22 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069080#comment-15069080
 ] 

Wangda Tan commented on YARN-4195:
--

Hi [~curino],

Thanks for responses, the "unified" label is a great idea, in the future, we 
could add more stuffs such as affinity/anti-affiity to it. (dynamic label to 
say if an app can assign container to a node or not). The dimension of node 
labels sounds not an issue since it has hard limit ({{|partitions|<=K}}, and 
YARN-4476).

I still have some questions. (Just rephrase my previous questions)
1) If we're configuring a scheduler, should we use the individual label (A/B) 
or normalized label (A/B/A_B)?
2) If a resource request want to allocate on "A", is it possible to get 
resource on "A_B"? If yes, is it means scheduler allocates more resource than 
required? (app wants 3GB on "A" only, but scheduler gives app 3GB on "A" and 
"B").

> Support of node-labels in the ReservationSystem "Plan"
> --
>
> Key: YARN-4195
> URL: https://issues.apache.org/jira/browse/YARN-4195
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4195.patch
>
>
> As part of YARN-4193 we need to enhance the InMemoryPlan (and related 
> classes) to track the per-label available resources, as well as the per-label
> reservation-allocations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4003) ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is not consistent

2015-12-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069062#comment-15069062
 ] 

Sunil G commented on YARN-4003:
---

Hi [~curino]
bq.Assume a reservation queue R1 launches tons of AMs, and now another 
reservation queue R2 is stuck not being able to run any job
Yes, I agree that with one sudden spike of usage from a queue can make other 
queues to starve. IN this case, I agree with current solution, it can 
definitely help to overcome starvation and adhere to a limit also (if not the 
worst case).

Long term plan sounds interesting. {{RM scheduling bandwidth}} per queue is 
definitely a good metric here. I assume this metric should be less that queue 
capacity always (not max-capacity), is it so?

{{cost of scheduler bandwidth}} is a metric which was long pending. However, it 
is more like a ranking too. Like, ranking all apps based on its demand 
(resource request) rate, and compute that total cost to the queue level. RM can 
make use of this to know which queue has less cost and which has more. So in 
reservation case,  AMs can be restricted more in a queue with high cost for 
scheduler bandwidth. thoughts?

> ReservationQueue inherit getAMResourceLimit() from LeafQueue, but behavior is 
> not consistent
> 
>
> Key: YARN-4003
> URL: https://issues.apache.org/jira/browse/YARN-4003
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Carlo Curino
>Assignee: Carlo Curino
> Attachments: YARN-4003.patch
>
>
> The inherited behavior from LeafQueue (limit AM % based on capacity) is not a 
> good fit for ReservationQueue (that have highly dynamic capacity). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics

2015-12-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4304:
--
Attachment: 0007-YARN-4304.patch

Hi [~leftnoteasy] 
Thank you for sharing the comments. Uploading a new patch addressing the 
comments.

bq. And I think you can move following code to CapacitySchedulerLeafQueueInfo:
I think if we move to {{CapacitySchedulerLeafQueueInfo}}, we may need to have 
this calculated for all accessible labels in the queue. So it may need to have 
a new list in lqInfo. In this case, is it fine to to keep in 
{{CapacitySchedulerPage}} itself. Thoughts?

As discussed offline, we will pre-compute AMLimit for all labels in queue prior 
to the loop in {{activateApplications}}

> AM max resource configuration per partition to be displayed/updated correctly 
> in UI and in various partition related metrics
> 
>
> Key: YARN-4304
> URL: https://issues.apache.org/jira/browse/YARN-4304
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: webapp
>Affects Versions: 2.7.1
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, 
> 0003-YARN-4304.patch, 0004-YARN-4304.patch, 0005-YARN-4304.patch, 
> 0005-YARN-4304.patch, 0006-YARN-4304.patch, 0007-YARN-4304.patch, 
> REST_and_UI.zip
>
>
> As we are supporting per-partition level max AM resource percentage 
> configuration, UI and various metrics also need to display correct 
> configurations related to same. 
> For eg: Current UI still shows am-resource percentage per queue level. This 
> is to be updated correctly when label config is used.
> - Display max-am-percentage per-partition in Scheduler UI (label also) and in 
> ClusterMetrics page
> - Update queue/partition related metrics w.r.t per-partition 
> am-resource-percentage



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069045#comment-15069045
 ] 

Jun Gong commented on YARN-3480:


Discussed with [~jianhe] offline, we think the 
implementation(YARN-3480.12.patch) is a bit complex and it’s OK that the number 
of attempts kept in store is not so accurate. So reattach previous 
patch(YARN-3480.11.patch, rename it to YARN-3480.13.patch).

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, 
> YARN-3480.12.patch, YARN-3480.13.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4496) Improve HA ResourceManager Failover detection on the client

2015-12-22 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069043#comment-15069043
 ] 

Arun Suresh commented on YARN-4496:
---

Had a discussion with [~subru], and we think this would also be useful for YARN 
federation (YARN-2915) as well

> Improve HA ResourceManager Failover detection on the client
> ---
>
> Key: YARN-4496
> URL: https://issues.apache.org/jira/browse/YARN-4496
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>
> HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to 
> improve Namenode failover detection in the client. It does this by 
> concurrently trying all namenodes and picks the namenode that returns the 
> fastest with a successful response as the active node.
> It would be useful to have a similar ProxyProvider for the Yarn RM (it can 
> possibly be done by converging some the class hierarchies to use the same 
> ProxyProvider)
> This would especially be useful for large YARN deployments with multiple 
> standby RMs where clients will be able to pick the active RM without having 
> to traverse a list of configured RMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4496) Improve HA ResourceManager Failover detection on the client

2015-12-22 Thread Arun Suresh (JIRA)
Arun Suresh created YARN-4496:
-

 Summary: Improve HA ResourceManager Failover detection on the 
client
 Key: YARN-4496
 URL: https://issues.apache.org/jira/browse/YARN-4496
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client, resourcemanager
Reporter: Arun Suresh
Assignee: Arun Suresh


HDFS deployments can currently use the {{RequestHedgingProxyProvider}} to 
improve Namenode failover detection in the client. It does this by concurrently 
trying all namenodes and picks the namenode that returns the fastest with a 
successful response as the active node.

It would be useful to have a similar ProxyProvider for the Yarn RM (it can 
possibly be done by converging some the class hierarchies to use the same 
ProxyProvider)

This would especially be useful for large YARN deployments with multiple 
standby RMs where clients will be able to pick the active RM without having to 
traverse a list of configured RMs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3480:
---
Attachment: YARN-3480.13.patch

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, 
> YARN-3480.12.patch, YARN-3480.13.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4494) Recover completed apps asynchronously

2015-12-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069014#comment-15069014
 ] 

Jun Gong commented on YARN-4494:


[~sunilg] Yes, I think we all agree with the option "an on-demand fast recovery 
". Sorry for my last response for misunderstanding.

> Recover completed apps asynchronously
> -
>
> Key: YARN-4494
> URL: https://issues.apache.org/jira/browse/YARN-4494
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> With RM HA enabled, when recovering apps, recover completed apps 
> asynchronously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4459) container-executor might kill process wrongly

2015-12-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068996#comment-15068996
 ] 

Jun Gong commented on YARN-4459:


Thanks [~Naganarasimha] for the info. Yes, it seems same problem. I think the 
problem does not only exist for 'DelayedProcessKiller', it might occur for 
every call to 'signal_container_as_user'. 

> container-executor might kill process wrongly
> -
>
> Key: YARN-4459
> URL: https://issues.apache.org/jira/browse/YARN-4459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4459.01.patch, YARN-4459.02.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4482) Default values of several config parameters are missing

2015-12-22 Thread Tianyin Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068959#comment-15068959
 ] 

Tianyin Xu commented on YARN-4482:
--

I got some time to look into this. 
Yes... the values changed... (that's why the values are commented...)

The default of {{yarn.client.failover-max-attempts}} becomes {{-1}}, while the 
defaults of {{yarn.client.failover-sleep-base-ms}} and 
{{yarn.client.failover-sleep-max-ms}} are equivalent to the default value of 
{{yarn.resourcemanager.connect.retry-interval.ms}} (which is {{3}} 
currently). 

The code is in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java

{code:title=RMProxy.java|borderStyle=solid}
177 long rmConnectionRetryIntervalMS =
178 conf.getLong(
179 YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS,
180 YarnConfiguration
181 .DEFAULT_RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS);
..
203   final long failoverSleepBaseMs = conf.getLong(
204   YarnConfiguration.CLIENT_FAILOVER_SLEEPTIME_BASE_MS,
205   rmConnectionRetryIntervalMS);
206 
207   final long failoverSleepMaxMs = conf.getLong(
208   YarnConfiguration.CLIENT_FAILOVER_SLEEPTIME_MAX_MS,
209   rmConnectionRetryIntervalMS);
210 
211   int maxFailoverAttempts = conf.getInt(
212   YarnConfiguration.CLIENT_FAILOVER_MAX_ATTEMPTS, -1);
{code}

The information should be updated in the docs.


> Default values of several config parameters are missing 
> 
>
> Key: YARN-4482
> URL: https://issues.apache.org/jira/browse/YARN-4482
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.6.2, 2.6.3
>Reporter: Tianyin Xu
>Assignee: Mohammad Shahid Khan
>Priority: Minor
>
> In {{yarn-default.xml}}, the default values of the following parameters are 
> commented out, 
> {{yarn.client.failover-max-attempts}}
> {{yarn.client.failover-sleep-base-ms}}
> {{yarn.client.failover-sleep-max-ms}}
> Are these default values changed (I suppose so)? If so, we should update the 
> new ones in {{yarn-default.xml}}. Right now, I don't know the real "default" 
> values...
> (yarn-default.xml)
> https://hadoop.apache.org/docs/r2.6.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> https://hadoop.apache.org/docs/r2.6.3/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-22 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068950#comment-15068950
 ] 

Li Lu commented on YARN-4224:
-

Thanks Varun. Some of my comments:
bq. You mean just have one endpoint and based on delimiters in UID, decide 
whether to fetch single entity or multiple ?
It's about locating resources in the system. /flows will be a query endpoint 
and we can send query parameters there, but /flows/\{uid\} will locate one 
single flow. What I'm confusing right now is, why do we need to have both 
plural and singular forms /run/\{uid\} and /runs/\{uid\}? Will they locate to 
the same run, given the same UID? 

bq. Because, the query before query for entities, in which we return UID, will 
be query for apps. This query is from application table where we do not have 
entity related information. So I cannot send all the possible entity types for 
an entity or a list of entities in the response. Hence when we query list of 
entities, it is within the scope of app UID and entity type. Hence entity type 
has to be specified.
bq. Yes, if we try to query all entity types and related entities, this would 
require scanning quite a bit of the entity table which can grow quite big. And 
from UI, I envisage only queries for APP_ATTEMPT and CONTAINER so we would know 
the entity type.

So the UID in /entities/\{entitytype\}/\{uid\}/ is actually app UID? This make 
the whole endpoint looks really weird... I thought it's an entity UID to locate 
to one timeline entity. However, I think you raised a very useful use case to 
query a certain type of entity for one application. Maybe we'd like to change 
the format of this endpoint to address this case? I don't really feel like the 
current form of the endpoint...

bq. Moreover, runs endpoint will do it for you i.e. fetch all flowruns for a 
flow.
OK this works for now. In future, if flows are associated with flow level 
aggregation data, we will need endpoints to retrieve flow level data. We can 
skip this step for our first milestone though. 

bq. There is. The \/entity\/{uid}\/ endpoint. I hope this is what your question 
was.
So to find one entity with cluster, user, flow, flowrun, appid and entity id, 
we do not have the hierarchical endpoint, but can only get an entity through 
the UID interface? Do we need the hierarchical interface for CLIs? 

We can certainly discuss more of these in our meeting tomorrow. 


> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-12-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068940#comment-15068940
 ] 

Daniel Templeton commented on YARN-4311:


Oh, and I'm a big proponent of using blank lines to make the code easier to 
read.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list

2015-12-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068939#comment-15068939
 ] 

Daniel Templeton commented on YARN-4311:


In the constructor, you check if the inactive nodes list entries are null 
before using them.  Is that necessary?  Further down in {{refreshNodes()}} you 
don't.  Same thing in {{refreshNodesGracefully()}}.

Inside the new for loop in {{refreshNodes()}} you should combine the nested ifs 
into a single conditional.

The multiple ifs and returns in {{isUntracked()}} should be combined:

{code}
  public boolean isUntrackedNode(String hostName) {
boolean untracked = false;
String ip = resolver.resolve(hostName);

includesFile =
conf.get(YarnConfiguration.RM_NODES_INCLUDE_FILE_PATH,
YarnConfiguration.DEFAULT_RM_NODES_INCLUDE_FILE_PATH);

synchronized (hostsReader) {
  Set hostsList = hostsReader.getHosts();
  Set excludeList = hostsReader.getExcludedHosts();

  untracked = !hostsList.isEmpty() &&
  !hostsList.contains(hostName) && !hostsList.contains(ip) &&
  !excludeList.contains(hostName) && !excludeList.contains(ip);
}

return untracked;
  }
{code}

Do note that with this solution, if a user does a node refresh at least once 
per node removal check interval, no nodes will ever be expunged because the 
timestamp will continually be updated and never exceed the interval.

> Removing nodes from include and exclude lists will not remove them from 
> decommissioned nodes list
> -
>
> Key: YARN-4311
> URL: https://issues.apache.org/jira/browse/YARN-4311
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, 
> YARN-4311-v3.patch
>
>
> In order to fully forget about a node, removing the node from include and 
> exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The 
> tricky part that [~jlowe] pointed out was the case when include lists are not 
> used, in that case we don't want the nodes to fall off if they are not active.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4138) Roll back container resource allocation after resource increase token expires

2015-12-22 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068934#comment-15068934
 ] 

Jian He commented on YARN-4138:
---

looks like some files are created at a wrong direcotry  "hadoop-yarn/" instead 
of hadoop-yarn-project 

> Roll back container resource allocation after resource increase token expires
> -
>
> Key: YARN-4138
> URL: https://issues.apache.org/jira/browse/YARN-4138
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, nodemanager, resourcemanager
>Reporter: MENG DING
>Assignee: MENG DING
> Attachments: YARN-4138-YARN-1197.1.patch, 
> YARN-4138-YARN-1197.2.patch, YARN-4138.3.patch
>
>
> In YARN-1651, after container resource increase token expires, the running 
> container is killed.
> This ticket will change the behavior such that when a container resource 
> increase token expires, the resource allocation of the container will be 
> reverted back to the value before the increase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers

2015-12-22 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-2885:
--
Attachment: YARN-2885-yarn-2877.v4.patch

Rebasing patch against latest YARN-2882 and YARN-4335 patches

> Create AMRMProxy request interceptor for distributed scheduling decisions for 
> queueable containers
> --
>
> Key: YARN-2885
> URL: https://issues.apache.org/jira/browse/YARN-2885
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, resourcemanager
>Reporter: Konstantinos Karanasos
>Assignee: Arun Suresh
> Attachments: YARN-2885-yarn-2877.001.patch, 
> YARN-2885-yarn-2877.002.patch, YARN-2885-yarn-2877.full-2.patch, 
> YARN-2885-yarn-2877.full-3.patch, YARN-2885-yarn-2877.full.patch, 
> YARN-2885-yarn-2877.v4.patch, YARN-2885_api_changes.patch
>
>
> We propose to add a Local ResourceManager (LocalRM) to the NM in order to 
> support distributed scheduling decisions. 
> Architecturally we leverage the RMProxy, introduced in YARN-2884. 
> The LocalRM makes distributed decisions for queuable containers requests. 
> Guaranteed-start requests are still handled by the central RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4397) if this addAll() function`s params is fault? @NodeListManager#getUnusableNodes()

2015-12-22 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-4397:
--
Fix Version/s: (was: 2.8.0)

> if this addAll() function`s params is fault? 
> @NodeListManager#getUnusableNodes()
> 
>
> Key: YARN-4397
> URL: https://issues.apache.org/jira/browse/YARN-4397
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.6.0
>Reporter: Feng Yuan
>
> code in NodeListManager#144L:
>   /**
>* Provides the currently unusable nodes. Copies it into provided 
> collection.
>* @param unUsableNodes
>*  Collection to which the unusable nodes are added
>* @return number of unusable nodes added
>*/
>   public int getUnusableNodes(Collection unUsableNodes) {
> unUsableNodes.addAll(unusableRMNodesConcurrentSet);
> return unusableRMNodesConcurrentSet.size();
>   }
> unUsableNodes and unusableRMNodesConcurrentSet's sequence is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068650#comment-15068650
 ] 

Hadoop QA commented on YARN-4062:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
5s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 16s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
10s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
17s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
36s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in feature-YARN-2928 
failed with JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 1m 52s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.8.0_66
 with JDK v1.8.0_66 generated 1 new issues (was 0, now 1). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 2m 12s {color} 
| {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_91
 with JDK v1.7.0_91 generated 1 new issues (was 0, now 1). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s 
{color} | {color:red} Patch generated 21 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 (total was 45, now 65). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s 
{color} | {color:red} The patch has 12 line(s) that end in whitespace. Use git 
apply --whitespace=fix. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 44s 
{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice
 introduced 1 new FindBugs issues. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} hadoop-yarn-server-timelineservice in the patch failed 
with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 2m 1s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-jdk1.7.0_91
 with JDK v1.7.0_91 generated 5 new issues (was 0, now 5). {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit 

[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068589#comment-15068589
 ] 

Hadoop QA commented on YARN-3480:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
46s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
34s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
19s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 45s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 53s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 146m 38s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestClientRMTokens |
|   | hadoop.yarn.server.resourcemanager.TestAMAuthorization |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779088/YARN-3480.12.patch |
| JIRA Issue | YARN-3480 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  c

[jira] [Updated] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2015-12-22 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated YARN-4062:
-
Attachment: YARN-4062-feature-YARN-2928.01.patch

Thanks [~sjlee0], renamed the patch now. 

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.1.patch, 
> YARN-4062-feature-YARN-2928.01.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4098) Document ApplicationPriority feature

2015-12-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068535#comment-15068535
 ] 

Sunil G commented on YARN-4098:
---

Hi [~rohithsharma]
Thank you for sharing the patch.

Some minor nits.
CS Page:
1. 
{noformat}
Higher the integer higher the priority of an applications. Note : Application 
priority is supported only for FIFO ordering policy.
{noformat}
Could we rephrase this as {{Higher integer value indicates higher priority for 
the application. Currently Application priority is supported only for FIFO 
ordering policy.}}

2. {{Application priority works along with FIFO ordering policy only}} can be 
{{Application priority works *only* along with FIFO ordering policy}}
3. {{Default priority for an applications can be at cluster level and queue 
level.}} its better to mention as {{an application}}.
4. Missing */* in {{etc/hadoop/capacity-scheduler.xml}}
5. typo in 
{{yarn.scheduler.capacity.root..default-application-priority}}.
 it can be {{}}

> Document ApplicationPriority feature
> 
>
> Key: YARN-4098
> URL: https://issues.apache.org/jira/browse/YARN-4098
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-4098.patch, 0001-YARN-4098.patch, YARN-4098.rar
>
>
> This JIRA is to track documentation of application priority and its user, 
> admin and REST interfaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068509#comment-15068509
 ] 

Hadoop QA commented on YARN-4352:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
4s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 46s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
55s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 2s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
15s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 2s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 1s {color} | 
{color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 13s {color} 
| {color:red} hadoop-common in the patch failed with JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 12s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | hadoop.net.TestNetUtils |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.security.ssl.TestReloadingX509TrustManager |
|   | hadoop.net.TestNetUtils |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779090/0001-YARN-4352.patch |
| JIRA Issue | YARN-4352 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dc18371b873e 3.13.0-36-l

[jira] [Commented] (YARN-4492) Add documentation for queue level preemption which is supported in Capacity scheduler

2015-12-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068503#comment-15068503
 ] 

Daniel Templeton commented on YARN-4492:


+1 (non-binding).  Thanks, [~Naganarasimha]!

> Add documentation for queue level preemption which is supported in Capacity 
> scheduler
> -
>
> Key: YARN-4492
> URL: https://issues.apache.org/jira/browse/YARN-4492
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: CapacityScheduler.html, YARN-4492.v1.001.patch, 
> YARN-4492.v1.002.patch, YARN-4492.v1.003.patch
>
>
> As part of YARN-2056, Support has been added to disable preemption for a 
> specific queue. This is a useful feature in a multiload cluster but currently 
> missing documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068493#comment-15068493
 ] 

Rohith Sharma K S commented on YARN-4352:
-

Tx Sunil for providing patch. 3rd approach looks fine to me. I ran test before 
and after applying the patch in my laptop(ubuntu installed), test cases is 
passing. But I am not sure why these test cases are failing last couple of 
months!!! May be I suspect that jenkin machine is changed. 
+1 lgtm, I will commit it tomorrow if there is no objections

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics

2015-12-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068490#comment-15068490
 ] 

Sangjin Lee commented on YARN-3816:
---

Just to clarify, it appears both aggregated (but not accumulated over time) 
metrics and the aggregated *and* accumulated metrics (\*-AREA metrics) end up 
in these separate entities.

While it might be fine for the \*-AREA metrics to be in the separate entities, 
I think it would be better for the regularly aggregated metrics to be in the 
application.

> [Aggregation] App-level aggregation and accumulation for YARN system metrics
> 
>
> Key: YARN-3816
> URL: https://issues.apache.org/jira/browse/YARN-3816
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>  Labels: yarn-2928-1st-milestone
> Attachments: Application Level Aggregation of Timeline Data.pdf, 
> YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, 
> YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, 
> YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, 
> YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, 
> YARN-3816-feature-YARN-2928.v4.1.patch, YARN-3816-poc-v1.patch, 
> YARN-3816-poc-v2.patch
>
>
> We need application level aggregation of Timeline data:
> - To present end user aggregated states for each application, include: 
> resource (CPU, Memory) consumption across all containers, number of 
> containers launched/completed/failed, etc. We need this for apps while they 
> are running as well as when they are done.
> - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be 
> aggregated to show details of states in framework level.
> - Other level (Flow/User/Queue) aggregation can be more efficient to be based 
> on Application-level aggregations rather than raw entity-level data as much 
> less raws need to scan (with filter out non-aggregated entities, like: 
> events, configurations, etc.).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068468#comment-15068468
 ] 

Naganarasimha G R commented on YARN-3367:
-

check style issues some cannot be handled(line lengths > 80) and some testcases 
and findbugs are valid , can get it fixed along with review comments on the 
approach

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4062) Add the flush and compaction functionality via coprocessors and scanners for flow run table

2015-12-22 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068450#comment-15068450
 ] 

Sangjin Lee commented on YARN-4062:
---

You'll need to use a name like {{YARN-4062-feature-YARN-2928.01.patch}}.

> Add the flush and compaction functionality via coprocessors and scanners for 
> flow run table
> ---
>
> Key: YARN-4062
> URL: https://issues.apache.org/jira/browse/YARN-4062
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Vrushali C
>Assignee: Vrushali C
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4062-YARN-2928.1.patch
>
>
> As part of YARN-3901, coprocessor and scanner is being added for storing into 
> the flow_run table. It also needs a flush & compaction processing in the 
> coprocessor and perhaps a new scanner to deal with the data during flushing 
> and compaction stages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-22 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4352:
-
Labels: security  (was: )

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
>  Labels: security
> Attachments: 0001-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3223) Resource update during NM graceful decommission

2015-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068420#comment-15068420
 ] 

Junping Du commented on YARN-3223:
--

Hi [~brookz], thanks for updating the patch. The current approach sounds OK to 
me. Only one issue here is: there is time window between completedContainer() 
and RMNodeResourceUpdateEvent get handled. So if a scheduling effort happens 
within this window, the new container could still get allocated on this node. 
Even worse case is if scheduling effort happen after RMNodeResourceUpdateEvent 
sent out but before it propagated to SchedulerNode, then you will find the 
total resource is lower than used resource and available resource is a negative 
value. 
IMO, a safer way is: besides your existing RMNodeResourceUpdateEvent update, in 
completedContainer() for decommissioning nodes, we can hold on adding back 
availableResource in SchedulerNode, but continue to deduct usedResource. At 
this moment, SchedulerNode's total resource will be lower than usedResource + 
availableResource, but it will soon corrected after RMNodeResourceUpdateEvent 
comes. How does this sound?

> Resource update during NM graceful decommission
> ---
>
> Key: YARN-3223
> URL: https://issues.apache.org/jira/browse/YARN-3223
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, resourcemanager
>Affects Versions: 2.7.1
>Reporter: Junping Du
>Assignee: Brook Zhou
> Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, 
> YARN-3223-v2.patch, YARN-3223-v3.patch
>
>
> During NM graceful decommission, we should handle resource update properly, 
> include: make RMNode keep track of old resource for possible rollback, keep 
> available resource to 0 and used resource get updated when
> container finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068403#comment-15068403
 ] 

Hadoop QA commented on YARN-3542:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 0s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
56s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 8m 17s {color} 
| {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 
generated 1 new issues (was 9, now 10). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 10m 35s 
{color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_91 with JDK 
v1.7.0_91 generated 2 new issues (was 10, now 12). {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s 
{color} | {color:red} Patch generated 4 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 269, now 265). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 15s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 21s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 37s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with 
JDK v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 

[jira] [Updated] (YARN-4352) Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient

2015-12-22 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-4352:
--
Attachment: 0001-YARN-4352.patch

Attaching a simpler patch to lookup when multiple loopback address is present 
in {{/etc/hosts}}
[~djp]/[~rohithsharma]/[~ozawa]. Could you please check

> Timeout for tests in TestYarnClient, TestAMRMClient and TestNMClient
> 
>
> Key: YARN-4352
> URL: https://issues.apache.org/jira/browse/YARN-4352
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Junping Du
>Assignee: Sunil G
> Attachments: 0001-YARN-4352.patch
>
>
> From 
> https://builds.apache.org/job/PreCommit-YARN-Build/9661/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client-jdk1.7.0_79.txt,
>  we can see the tests in TestYarnClient, TestAMRMClient and TestNMClient get 
> timeout which can be reproduced locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068394#comment-15068394
 ] 

Jun Gong commented on YARN-3480:


Attach a new patch, move remove attempts logic to RMStateStore, then it could 
deal with cases: fail to store attempts and fail to remove attempts.

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, YARN-3480.12.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts

2015-12-22 Thread Jun Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Gong updated YARN-3480:
---
Attachment: YARN-3480.12.patch

> Recovery may get very slow with lots of services with lots of app-attempts
> --
>
> Key: YARN-3480
> URL: https://issues.apache.org/jira/browse/YARN-3480
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-3480.01.patch, YARN-3480.02.patch, 
> YARN-3480.03.patch, YARN-3480.04.patch, YARN-3480.05.patch, 
> YARN-3480.06.patch, YARN-3480.07.patch, YARN-3480.08.patch, 
> YARN-3480.09.patch, YARN-3480.10.patch, YARN-3480.11.patch, YARN-3480.12.patch
>
>
> When RM HA is enabled and running containers are kept across attempts, apps 
> are more likely to finish successfully with more retries(attempts), so it 
> will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However 
> it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make 
> RM recover process much slower. It might be better to set max attempts to be 
> stored in RMStateStore.
> BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to 
> a small value, retried attempts might be very large. So we need to delete 
> some attempts stored in RMStateStore and RMStateStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4494) Recover completed apps asynchronously

2015-12-22 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068377#comment-15068377
 ] 

Sunil G commented on YARN-4494:
---

Thanks [~hex108] for filing this. It will speed up the recovery.
For the specific scenario mentioned here, I feel a fast recovery is better than 
blocking the entire client call. We are not sure how much time it may take for 
recovery of completed apps. So an on-demand fast recovery looks a good option, 
provided we will  keep some cache information that these fast-recovered apps 
which are  not to be recovered again.



> Recover completed apps asynchronously
> -
>
> Key: YARN-4494
> URL: https://issues.apache.org/jira/browse/YARN-4494
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
>
> With RM HA enabled, when recovering apps, recover completed apps 
> asynchronously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068355#comment-15068355
 ] 

Hadoop QA commented on YARN-3367:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 
53s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 13s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s 
{color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
40s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
43s {color} | {color:green} feature-YARN-2928 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 49s 
{color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
35s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s 
{color} | {color:red} Patch generated 13 new checkstyle issues in 
hadoop-yarn-project/hadoop-yarn (total was 48, now 58). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
39s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 36s 
{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
introduced 2 new FindBugs issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 28s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 52s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 10s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 19s {color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s 
{color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_91. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 19s 
{color} | {color:green} hadoop-yarn-common in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} unit {col

[jira] [Commented] (YARN-4373) Jobs can be temporarily forgotten during recovery

2015-12-22 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068335#comment-15068335
 ] 

Daniel Templeton commented on YARN-4373:


I'm also incredulous.  I'm still working to reproduce the issue.  It was 
reported by our testing team.  If/when I reproduce it, I'll post the details.

> Jobs can be temporarily forgotten during recovery
> -
>
> Key: YARN-4373
> URL: https://issues.apache.org/jira/browse/YARN-4373
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Critical
>
> The RM becomes available to service requests before state store recovery is 
> started.  Before recovery and during the recovery period, it's possible for a 
> client to request an application report for a running application to which 
> the RM will respond that the application in unknown.
> I'm seeing this issue with Oozie during an RM failover.  Until the active 
> finishes recovery, it reports erroneous information to Oozie, which doesn't 
> have context to know that it should just try again later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068316#comment-15068316
 ] 

Junping Du commented on YARN-3586:
--

Thanks all for help in review!

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Fix For: YARN-2928
>
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch, 
> YARN-3586-feature-YARN-2928.v2.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism

2015-12-22 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-3542:

Attachment: YARN-3542.007.patch

Uploaded a new patch that'll apply on trunk. Just a note on the configurations 
- all the existing configurations for CPU isolation from 
CGroupsLCEResourceHandler carry over without any change. My expectation is that 
we'll continue to use them going forward. The only issue we haven't quite 
figured out is the configs for enabling the resource handlers.

> Re-factor support for CPU as a resource using the new ResourceHandler 
> mechanism
> ---
>
> Key: YARN-3542
> URL: https://issues.apache.org/jira/browse/YARN-3542
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Critical
> Attachments: YARN-3542.001.patch, YARN-3542.002.patch, 
> YARN-3542.003.patch, YARN-3542.004.patch, YARN-3542.005.patch, 
> YARN-3542.006.patch, YARN-3542.007.patch
>
>
> In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier 
> addition of new resource types in the nodemanager (this was used for network 
> as a resource - See YARN-2140 ). We should refactor the existing CPU 
> implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using 
> the new ResourceHandler mechanism. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068238#comment-15068238
 ] 

Varun Saxena commented on YARN-3586:


Will commit it shortly.

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch, 
> YARN-3586-feature-YARN-2928.v2.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3367) Replace starting a separate thread for post entity with event loop in TimelineClient

2015-12-22 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3367:

Attachment: YARN-3367-feature-YARN-2928.v1.004.patch

WIP patch with fixes for test case

> Replace starting a separate thread for post entity with event loop in 
> TimelineClient
> 
>
> Key: YARN-3367
> URL: https://issues.apache.org/jira/browse/YARN-3367
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3367-feature-YARN-2928.003.patch, 
> YARN-3367-feature-YARN-2928.v1.002.patch, 
> YARN-3367-feature-YARN-2928.v1.004.patch, YARN-3367.YARN-2928.001.patch
>
>
> Since YARN-3039, we add loop in TimelineClient to wait for 
> collectorServiceAddress ready before posting any entity. In consumer of  
> TimelineClient (like AM), we are starting a new thread for each call to get 
> rid of potential deadlock in main thread. This way has at least 3 major 
> defects:
> 1. The consumer need some additional code to wrap a thread before calling 
> putEntities() in TimelineClient.
> 2. It cost many thread resources which is unnecessary.
> 3. The sequence of events could be out of order because each posting 
> operation thread get out of waiting loop randomly.
> We should have something like event loop in TimelineClient side, 
> putEntities() only put related entities into a queue of entities and a 
> separated thread handle to deliver entities in queue to collector via REST 
> call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3586) RM only get back addresses of Collectors that NM needs to know.

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068187#comment-15068187
 ] 

Naganarasimha G R commented on YARN-3586:
-

+1, latest patch LGTM

> RM only get back addresses of Collectors that NM needs to know.
> ---
>
> Key: YARN-3586
> URL: https://issues.apache.org/jira/browse/YARN-3586
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager, timelineserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3586-demo.patch, YARN-3586-feature-YARN-2928.patch, 
> YARN-3586-feature-YARN-2928.v2.patch
>
>
> After YARN-3445, RM cache runningApps for each NM. So RM heartbeat back to NM 
> should only include collectors' address for running applications against 
> specific NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3995) Some of the NM events are not getting published due race condition when AM container finishes in NM

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068068#comment-15068068
 ] 

Naganarasimha G R commented on YARN-3995:
-

Hi [~sjlee0],
As per the discussion we had in the status call, we planned to stop the 
collector after 2 seconds of the AM container finished, but already we are 
having a code which waits for one second and then closes the collector.
Now IIUC the scope of this jira :
# Introduce a configurable period to wait 
# Instead of spawning multiple threads may be we can have single thread which 
does this activity ?

Or do we need to introduce some thing else ?
bq When RM finishes the attempt then it can send one finish event through 
timelineclient
IMO this also will not gurantee that no event is missed. So i think 
configurable wait period is better. Thoughts ?


> Some of the NM events are not getting published due race condition when AM 
> container finishes in NM 
> 
>
> Key: YARN-3995
> URL: https://issues.apache.org/jira/browse/YARN-3995
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: nodemanager, timelineserver
>Affects Versions: YARN-2928
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>  Labels: yarn-2928-1st-milestone
>
> As discussed in YARN-3045:  While testing in TestDistributedShell found out 
> that few of the container metrics events were failing as there will be race 
> condition. When the AM container finishes and removes the collector for the 
> app, still there is possibility that all the events published for the app by 
> the current NM and other NM are still in pipeline, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4470) Application Master in-place upgrade

2015-12-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068034#comment-15068034
 ] 

Steve Loughran commented on YARN-4470:
--

talk to [~gsaha] about the work. Thinking more about it, one thing we can't 
currently do is change the command line -jvm heap, cli arguments, etc. Having a 
way to update in-place the AM launch context prior to triggering an AM restart 
could address that

> Application Master in-place upgrade
> ---
>
> Key: YARN-4470
> URL: https://issues.apache.org/jira/browse/YARN-4470
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: resourcemanager
>Reporter: Giovanni Matteo Fumarola
>Assignee: Giovanni Matteo Fumarola
> Attachments: AM in-place upgrade design. rev1.pdf
>
>
> It would be nice if clients could ask for an AM in-place upgrade.
> It will give to YARN the possibility to upgrade the AM, without losing the 
> work
> done within its containers. This allows to deploy bug-fixes and new versions 
> of the AM incurring in long service downtimes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3863) Enhance filters in TimelineReader

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068029#comment-15068029
 ] 

Varun Saxena commented on YARN-3863:


Will update patch to fix checkstyle, whitespace and javac issues

> Enhance filters in TimelineReader
> -
>
> Key: YARN-3863
> URL: https://issues.apache.org/jira/browse/YARN-3863
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-3863-feature-YARN-2928.wip.003.patch, 
> YARN-3863-feature-YARN-2928.wip.01.patch, 
> YARN-3863-feature-YARN-2928.wip.02.patch, 
> YARN-3863-feature-YARN-2928.wip.04.patch, 
> YARN-3863-feature-YARN-2928.wip.05.patch
>
>
> Currently filters in timeline reader will return an entity only if all the 
> filter conditions hold true i.e. only AND operation is supported. We can 
> support OR operation for the filters as well. Additionally as primary backend 
> implementation is HBase, we can design our filters in a manner, where they 
> closely resemble HBase Filters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4109) Exception on RM scheduler page loading with labels

2015-12-22 Thread Mohammad Shahid Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068001#comment-15068001
 ] 

Mohammad Shahid Khan commented on YARN-4109:


UT Failure are unrelated to current patch.
UT addition/modification not required only small UI change.


> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.Htt

[jira] [Commented] (YARN-4459) container-executor might kill process wrongly

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067999#comment-15067999
 ] 

Naganarasimha G R commented on YARN-4459:
-

Is this issue similar to YARN-3678 ?

> container-executor might kill process wrongly
> -
>
> Key: YARN-4459
> URL: https://issues.apache.org/jira/browse/YARN-4459
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Jun Gong
>Assignee: Jun Gong
> Attachments: YARN-4459.01.patch, YARN-4459.02.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4109) Exception on RM scheduler page loading with labels

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067974#comment-15067974
 ] 

Hadoop QA commented on YARN-4109:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
48s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
33s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 51s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.8.0_66. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 6s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK 
v1.7.0_91. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
24s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 136m 47s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_66 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
| JDK v1.7.0_91 Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestAMAuthorization |
|   | hadoop.yarn.server.resourcemanager.TestClientRMTokens |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12775857/YARN-4109_1.patch

[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067939#comment-15067939
 ] 

Varun Saxena commented on YARN-4224:


Formatting issue for one of the comments. Writing it here again.
bq. 3. Seems like there is no full path to locate one entity from the cluster, 
user, flow, run, app, entity type, and entity id. Are we omitting this endpoint 
deliberately?
There is. The {{\/entity\/\{uid\}\/}} endpoint. I hope this is what your 
question was.

> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4224) Change the ATSv2 reader side REST interface to conform to current REST APIs' in YARN

2015-12-22 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067935#comment-15067935
 ] 

Varun Saxena commented on YARN-4224:


[~gtCarrera9],
{quote}
1. I understand we would like to have plural forms for listing (like /flows, 
/apps) and singular forms for detail (like /flow/{uid}). But then, why do we 
need both /runs/{uid} and /run/{uid}?
The same question also applies to apps. 
{quote}
You mean just have one endpoint and based on delimiters in UID, decide whether 
to fetch single entity or multiple ?

{quote}
For entities, we require both UID and type. Why type is not a part of UID 
(which means UID is not sufficient to identify an entity)?
{quote}
Because, the query before query for entities, in which we return UID, will be 
query for apps. This query is from application table where we do not have 
entity related information. So I cannot send all the possible entity types for 
an entity or a list of entities in the response. Hence when we query list of 
entities, it is within the scope of app UID and entity type. Hence entity type 
has to be specified.

{quote}
Or, are you planning to support operations like "list all entities in a given 
entity type"?
{quote}
Yes, if we try to query all entity types and related entities, this would 
require scanning quite a bit of the entity table which can grow quite big. And 
from UI, I envisage only queries for APP_ATTEMPT and CONTAINER so we would know 
the entity type.

{quote}
If it is the latter, then do we want to consider put type into query parameters 
on end point entities?
{quote}
Normally in REST, mandatory params are kept as part of path. I expect entity 
type to be a mandatory param. I am assuming we do not want to support queries 
like get entities for all possible entity types. Thoughts ?

bq. For flows, why we' re not including an UID endpoint to locate one flow? 
This poses a challenge when we'd like to list all flow runs within one flow 
(or, do we have any other end points to do this work? ). 
When we query flows, we return all possible flow runs with it. So query for a 
single flow is not going to give any new information.
Moreover, runs endpoint will do it for you i.e. fetch all flowruns for a flow.

{quote}
3. Seems like there is no full path to locate one entity from the cluster, 
user, flow, run, app, entity type, and entity id. Are we omitting this endpoint 
deliberately? 
{quote}
There is. The {{\/entity\/{uid}\/}} endpoint. I hope this is what your question 
was.

{quote}
As a side note, in this patch there are 3 types of "shortcuts" in the URL: omit 
the cluster id (with default cluster id), omit user id (with default user id) 
and directly access app id. I'm OK with direct accessing app ids (with cluster 
id), but do we want to omit the other two? Comments are more then welcome.
{quote}
Can you elaborate a bit on this ?
We have 3 endpoints. One with cluster id, one without cluster id(default 
cluster from config is taken) and one with UID. For apps, the UID endpoint will 
contain flow context information.

bq. I'm also debating with myself on this. Right now I'm leaning towards to 
make the UIDs transparent to the storage layer.
I am fine either ways. I can see pros and cons for both.
Only concern I see is that if we are fetching a lot of entities and specify a 
high limit, we need to iterate over all the entities again to fill the UID.
If it is a mere 50-100 entities then should be negligible difference but what 
if its very high.
Another thing we need to ponder is that whether we need to support pagination 
for UIs' and if it would be possible to support it. Because then we will have 
to store some contextual information in reader. Or we can send some info back 
in response and continue from there for next pagination request. That is handle 
pagination by ourselves. Not sure if we can do this before 1st milestone though.


> Change the ATSv2 reader side REST interface to conform to current REST APIs' 
> in YARN
> 
>
> Key: YARN-4224
> URL: https://issues.apache.org/jira/browse/YARN-4224
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: YARN-2928
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: yarn-2928-1st-milestone
> Attachments: YARN-4224-YARN-2928.01.patch, 
> YARN-4224-feature-YARN-2928.wip.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4172) Extend DominantResourceCalculator to account for all resources

2015-12-22 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067867#comment-15067867
 ] 

Varun Vasudev commented on YARN-4172:
-

Looks like we need to rebase the YARN-3926 branch. Unfortunately, we can't do 
force pushes so we might have to use merge.

> Extend DominantResourceCalculator to account for all resources
> --
>
> Key: YARN-4172
> URL: https://issues.apache.org/jira/browse/YARN-4172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4172-YARN-3926.001.patch, 
> YARN-4172-YARN-3926.002.patch
>
>
> Now that support for multiple resources is present in the resource class, we 
> need to modify DominantResourceCalculator to account for the new resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-4027) ApplicationHistory web UI(timeline V1) apps column rendering is incorrect

2015-12-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S resolved YARN-4027.
-
Resolution: Duplicate

Closing as duplicate

> ApplicationHistory web UI(timeline V1) apps column rendering is incorrect
> -
>
> Key: YARN-4027
> URL: https://issues.apache.org/jira/browse/YARN-4027
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>
> It is observed that after YARN-3948, timeline web UI does not rendering 
> correctly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4172) Extend DominantResourceCalculator to account for all resources

2015-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067861#comment-15067861
 ] 

Hadoop QA commented on YARN-4172:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} docker {color} | {color:red} 22m 34s 
{color} | {color:red} Docker failed to build yetus/hadoop:a890a31. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12779018/YARN-4172-YARN-3926.002.patch
 |
| JIRA Issue | YARN-4172 |
| Powered by | Apache Yetus 0.2.0-SNAPSHOT   http://yetus.apache.org |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/10072/console |


This message was automatically generated.



> Extend DominantResourceCalculator to account for all resources
> --
>
> Key: YARN-4172
> URL: https://issues.apache.org/jira/browse/YARN-4172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4172-YARN-3926.001.patch, 
> YARN-4172-YARN-3926.002.patch
>
>
> Now that support for multiple resources is present in the resource class, we 
> need to modify DominantResourceCalculator to account for the new resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4234) New put APIs in TimelineClient for ats v1.5

2015-12-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067853#comment-15067853
 ] 

Junping Du commented on YARN-4234:
--

Thanks [~xgong] for updating the patch. +1 on latest patch. I will commit it 
tomorrow if no further comments or objections from others.

> New put APIs in TimelineClient for ats v1.5
> ---
>
> Key: YARN-4234
> URL: https://issues.apache.org/jira/browse/YARN-4234
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-4234-2015-11-13.1.patch, 
> YARN-4234-2015-11-16.1.patch, YARN-4234-2015-11-16.2.patch, 
> YARN-4234-2015.2.patch, YARN-4234.1.patch, YARN-4234.2.patch, 
> YARN-4234.2015-11-12.1.patch, YARN-4234.2015-11-12.1.patch, 
> YARN-4234.2015-11-18.1.patch, YARN-4234.2015-11-18.2.patch, 
> YARN-4234.2015-11-18.patch, YARN-4234.2015-12-09.patch, 
> YARN-4234.2015-12-09.patch, YARN-4234.2015-12-17.1.patch, 
> YARN-4234.2015-12-18.1.patch, YARN-4234.2015-12-18.patch, 
> YARN-4234.2015-12-21.1.patch, YARN-4234.20151109.patch, 
> YARN-4234.20151110.1.patch, YARN-4234.2015.1.patch, YARN-4234.3.patch
>
>
> In this ticket, we will add new put APIs in timelineClient to let 
> clients/applications have the option to use ATS v1.5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4466) ResourceManager should tolerate unexpected exceptions to happen in non-critical subsystem/services like SystemMetricsPublisher

2015-12-22 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067852#comment-15067852
 ] 

Naganarasimha G R commented on YARN-4466:
-

Hi [~djp]
Was looking into the exception trace which caused the RM to go down in 
YARN-4452  issue
{code}
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.SystemMetricsPublisher.appAttemptRegistered(SystemMetricsPublisher.java:165)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1533)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AMRegisteredTransition.transition(RMAppAttemptImpl.java:1505)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:845)
at 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:110)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:796)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
{code}
Here {{rmDispatcher-> ApplicationAttemptEventDispatcher-> ... -> transition in 
RMAppAttemptImpl -> SMP.appAttemptRegistered}} causes a NPE. In 
{{AsyncDispatcher.dispatch}} method on exception we check for 
{{exitOnDispatchException}} and and if true we call System.exit. 
So was not able to figure out an approach here to have a check such that RM 
doesn't fail. It was basically a bug.
So IMO its *not* always possible to cover all scenarios but what we could do is 
*expose interface in AsyncDispatcher to allow setting exitOnDispatchException 
variable and the dispatchers used in ATS can set it false* so that RM doesn't 
go down for exceptions related to dispatching ATS events. But apart from it 
even if introduce some mechanism we might miss to get some hidden bugs.
Apart from TimelinePublisher only other place  in RM i could see 
AsyncDispatcher used where we can neglect the failures/exceptions is 
{{RMApplicationHistoryWriter}} which is already deprecated ! 
Thoughts ?

> ResourceManager should tolerate unexpected exceptions to happen in 
> non-critical subsystem/services like SystemMetricsPublisher
> --
>
> Key: YARN-4466
> URL: https://issues.apache.org/jira/browse/YARN-4466
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Naganarasimha G R
>
> From my comment in 
> YARN-4452(https://issues.apache.org/jira/browse/YARN-4452?focusedCommentId=15059805&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15059805),
>  we should make RM more robust with ignore (but log) unexpected exception in 
> its non-critical subsystems/services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2599) Standby RM should also expose some jmx and metrics

2015-12-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067814#comment-15067814
 ] 

Rohith Sharma K S commented on YARN-2599:
-

Apologies for missing the sequence of comments:-( I will try to make a progress 
on this.

> Standby RM should also expose some jmx and metrics
> --
>
> Key: YARN-2599
> URL: https://issues.apache.org/jira/browse/YARN-2599
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Assignee: Rohith Sharma K S
>
> YARN-1898 redirects jmx and metrics to the Active. As discussed there, we 
> need to separate out metrics displayed so the Standby RM can also be 
> monitored. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4172) Extend DominantResourceCalculator to account for all resources

2015-12-22 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-4172:

Attachment: YARN-4172-YARN-3926.002.patch

bq. 1) Who will update DRC.resourceNames, I found currently it hardcoded to 
memory and vcores.

This will be handled as part of later patches. For now, only memory and vcores 
are supported. Eventually, the list will be generated via a config.

bq. 2) DominantResourceCalculator#compare: Need handle the case when 
resourceName not existed at lhs or rhs?

This would throw an exception in getResourceInformation.

bq. Could use ResourceInformation.compareTo instead of 
ResourceInformation.compareValue since resourceName equals to each other? Is it 
still necessary to keep compareValue?

Good point. Fixed.

bq. 3) Suggest to use > = < to do compare since of switch (diff).

Fixed.

bq. 4) DRC#computeAvailableContainers need consider denominator == 0?

The current implementation doesn't handle the case - how should I handle it?

bq. 5) Is it possible we can return a dummy ResourceInformation (with value == 
0) instead of check exception at lots of places.

I would prefer to throw an exception - FairScheduler uses 0 Resource objects in 
a lot of place - I don't want to return something that indicates the operation 
was successful.

> Extend DominantResourceCalculator to account for all resources
> --
>
> Key: YARN-4172
> URL: https://issues.apache.org/jira/browse/YARN-4172
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: YARN-4172-YARN-3926.001.patch, 
> YARN-4172-YARN-3926.002.patch
>
>
> Now that support for multiple resources is present in the resource class, we 
> need to modify DominantResourceCalculator to account for the new resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4109) Exception on RM scheduler page loading with labels

2015-12-22 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4109:

Priority: Minor  (was: Major)

> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
>Priority: Minor
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpCo

[jira] [Commented] (YARN-4109) Exception on RM scheduler page loading with labels

2015-12-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067782#comment-15067782
 ] 

Rohith Sharma K S commented on YARN-4109:
-

+1 lgtm, pending jenkins

> Exception on RM scheduler page loading with labels
> --
>
> Key: YARN-4109
> URL: https://issues.apache.org/jira/browse/YARN-4109
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Mohammad Shahid Khan
> Attachments: YARN-4109_1.patch
>
>
> Configure node label and load scheduler Page
> On each reload of the page the below exception gets thrown in logs
> {code}
> 2015-09-03 11:27:08,544 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error 
> handling URI: /cluster/scheduler
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>   at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
>   at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
>   at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebAppFilter.doFilter(RMWebAppFilter.java:139)
>   at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
>   at 
> com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
>   at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
>   at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
>   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:663)
>   at 
> org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291)
>   at 
> org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:615)
>   at 
> org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilter.doFilter(RMAuthenticationFilter.java:82)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>   at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>   at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>   at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>   at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>   at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.h

[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-22 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-4324:
---
Attachment: (was: am105361log.tar.gz)

> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
> Attachments: logs.rar, yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4324) AM hang more than 10 min was kill by RM

2015-12-22 Thread tangshangwen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tangshangwen updated YARN-4324:
---
Attachment: am105361log.tar.gz

I update other AM Log

> AM hang more than 10 min was kill by RM
> ---
>
> Key: YARN-4324
> URL: https://issues.apache.org/jira/browse/YARN-4324
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: tangshangwen
> Attachments: am105361log.tar.gz, logs.rar, yarn-nodemanager-dumpam.log
>
>
> this is my logs
> 2015-11-02 01:14:54,175 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 2865
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1446203652278_135526Job Transitioned from RUNNING to COMMITTING   
> 2015-11-02 01:14:54,176 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1446203652278_135526_m_001777_1 TaskAttempt Transition
> ed from UNASSIGNED to KILLED
> 2015-11-02 01:14:54,176 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT  
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: MRAppMaster received a 
> signal. Signaling RMCommunicator and JobHistoryEventHandler.
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: RMCommunicator 
> notified that iSignalled is: true
> 2015-11-02 01:24:15,851 INFO [Thread-1] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> the hive map run 100% and return map 0% and the job failed!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3458) CPU resource monitoring in Windows

2015-12-22 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067745#comment-15067745
 ] 

Inigo Goiri commented on YARN-3458:
---

[~cnauroth], thank you very much for the review!

> CPU resource monitoring in Windows
> --
>
> Key: YARN-3458
> URL: https://issues.apache.org/jira/browse/YARN-3458
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Windows
>Reporter: Inigo Goiri
>Assignee: Inigo Goiri
>Priority: Minor
>  Labels: BB2015-05-TBR, containers, metrics, windows
> Fix For: 2.8.0
>
> Attachments: YARN-3458-1.patch, YARN-3458-2.patch, YARN-3458-3.patch, 
> YARN-3458-4.patch, YARN-3458-5.patch, YARN-3458-6.patch, YARN-3458-7.patch, 
> YARN-3458-8.patch, YARN-3458-9.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> The current implementation of getCpuUsagePercent() for 
> WindowsBasedProcessTree is left as unavailable. Attached a proposal of how to 
> do it. I reused the CpuTimeTracker using 1 jiffy=1ms.
> This was left open by YARN-3122.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3692) Allow REST API to set a user generated message when killing an application

2015-12-22 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067730#comment-15067730
 ] 

Rohith Sharma K S commented on YARN-3692:
-

Tx [~rjainqb]for sharing your use case.
It was quite long time looking into this JIRA. I will make a progress on this. 
Similar line of thinking, we can add diagnosis message for failing the attempt 
too.


> Allow REST API to set a user generated message when killing an application
> --
>
> Key: YARN-3692
> URL: https://issues.apache.org/jira/browse/YARN-3692
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Rajat Jain
>Assignee: Rohith Sharma K S
>
> Currently YARN's REST API supports killing an application without setting a 
> diagnostic message. It would be good to provide that support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)