[jira] [Assigned] (YARN-5479) FairScheduler: Scheduling performance improvement

2016-08-06 Thread He Tianyi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Tianyi reassigned YARN-5479:
---

Assignee: He Tianyi

> FairScheduler: Scheduling performance improvement
> -
>
> Key: YARN-5479
> URL: https://issues.apache.org/jira/browse/YARN-5479
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
>
> Currently ResourceManager uses a single thread to handle async events for 
> scheduling. As number of nodes grows, more events need to be processed in 
> time in FairScheduler. Also, increased number of applications & queues slows 
> down processing of each single event. 
> There are two cases that slow processing of nodeUpdate events is problematic:
> A. global throughput is lower than number of nodes through heartbeat rounds. 
> This keeps resource from being allocated since the inefficiency.
> B. global throughput meets the need, but for some of these rounds, events of 
> some nodes cannot get processed before next heartbeat. This brings 
> inefficiency handling burst requests (i.e. newly submitted MapReduce 
> application cannot get its all task launched soon given enough resource).
> Pretty sure some people will encounter the problem eventually after a single 
> cluster is scaled to several K of nodes (even with {{assignmultiple}} 
> enabled).
> This issue proposes to perform several optimization towards performance in 
> FairScheduler {{nodeUpdate}} method. To be specific:
> A. trading off fairness with efficiency, queue & app sorting can be skipped 
> (or should this be called 'delayed sorting'?). we can either start another 
> dedicated thread to do the sorting & updating, or actually perform sorting 
> after current result have been used several times (say sort once in every 100 
> calls.)
> B. performing calculation on {{Resource}} instances is expensive, since at 
> least 2 objects ({{ResourceImpl}} and its proto builder) is created each time 
> (using 'immutable' apis). the overhead can be eliminated with a 
> light-weighted implementation of Resource, which do not instantiate a builder 
> until necessary, because most instances are used as intermediate result in 
> scheduler instead of being exchanged via IPC. Also, {{createResource}} is 
> using reflection, which can be replaced by a plain {{new}} (for scheduler 
> usage only). furthermore, perhaps we could 'intern' resource to avoid 
> allocation.
> C. other minor changes: such as move {{updateRootMetrics}} call to 
> {{update}}, making root queue metrics eventual consistent (which may 
> satisfies most of the needs). or introduce counters to {{getResourceUsage}} 
> and make changing of resource incrementally instead of recalculate each time.
> With A and B, I was looking at 4 times improvement in a cluster with 2K nodes.
> Suggestions? Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5479) FairScheduler: Scheduling performance improvement

2016-08-06 Thread He Tianyi (JIRA)
He Tianyi created YARN-5479:
---

 Summary: FairScheduler: Scheduling performance improvement
 Key: YARN-5479
 URL: https://issues.apache.org/jira/browse/YARN-5479
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: He Tianyi


Currently ResourceManager uses a single thread to handle async events for 
scheduling. As number of nodes grows, more events need to be processed in time 
in FairScheduler. Also, increased number of applications & queues slows down 
processing of each single event. 

There are two cases that slow processing of nodeUpdate events is problematic:
A. global throughput is lower than number of nodes through heartbeat rounds. 
This keeps resource from being allocated since the inefficiency.
B. global throughput meets the need, but for some of these rounds, events of 
some nodes cannot get processed before next heartbeat. This brings inefficiency 
handling burst requests (i.e. newly submitted MapReduce application cannot get 
its all task launched soon given enough resource).

Pretty sure some people will encounter the problem eventually after a single 
cluster is scaled to several K of nodes (even with {{assignmultiple}} enabled).

This issue proposes to perform several optimization towards performance in 
FairScheduler {{nodeUpdate}} method. To be specific:
A. trading off fairness with efficiency, queue & app sorting can be skipped (or 
should this be called 'delayed sorting'?). we can either start another 
dedicated thread to do the sorting & updating, or actually perform sorting 
after current result have been used several times (say sort once in every 100 
calls.)

B. performing calculation on {{Resource}} instances is expensive, since at 
least 2 objects ({{ResourceImpl}} and its proto builder) is created each time 
(using 'immutable' apis). the overhead can be eliminated with a light-weighted 
implementation of Resource, which do not instantiate a builder until necessary, 
because most instances are used as intermediate result in scheduler instead of 
being exchanged via IPC. Also, {{createResource}} is using reflection, which 
can be replaced by a plain {{new}} (for scheduler usage only). furthermore, 
perhaps we could 'intern' resource to avoid allocation.

C. other minor changes: such as move {{updateRootMetrics}} call to {{update}}, 
making root queue metrics eventual consistent (which may satisfies most of the 
needs). or introduce counters to {{getResourceUsage}} and make changing of 
resource incrementally instead of recalculate each time.

With A and B, I was looking at 4 times improvement in a cluster with 2K nodes.

Suggestions? Opinions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5457) Refactor DistributedScheduling framework to pull out common functionality

2016-08-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410724#comment-15410724
 ] 

Hadoop QA commented on YARN-5457:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
42s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 19s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
44s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 2s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 6s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 
40s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 47s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 47s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s 
{color} | {color:red} root: The patch generated 3 new + 364 unchanged - 14 
fixed = 367 total (was 378) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 
41s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s 
{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 
54s {color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 15s 
{color} | {color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common 
generated 1 new + 159 unchanged - 0 fixed = 160 total (was 159) {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s 
{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 16s {color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s 
{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 19s 
{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 37m 19s {color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 30s {color} 
| {color:red} hadoop-yarn-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 111m 32s 
{color} | {color:green} hadoop-mapreduce-client-jobclient in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 227m 45s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.logaggregation.TestAggregatedL

[jira] [Commented] (YARN-5287) LinuxContainerExecutor fails to set proper permission

2016-08-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410709#comment-15410709
 ] 

Hudson commented on YARN-5287:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10230 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10230/])
YARN-5287. LinuxContainerExecutor fails to set proper permission. 
(naganarasimha_gr: rev 131d58a24edcf3b492a7dd0fa5bb3dbf27daf95d)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


> LinuxContainerExecutor fails to set proper permission
> -
>
> Key: YARN-5287
> URL: https://issues.apache.org/jira/browse/YARN-5287
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.2
>Reporter: Ying Zhang
>Assignee: Ying Zhang
>Priority: Minor
> Attachments: YARN-5287-tmp.patch, YARN-5287.003.patch, 
> YARN-5287.004.patch, YARN-5287.005.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> LinuxContainerExecutor fails to set the proper permissions on the local 
> directories(i.e., /hadoop/yarn/local/usercache/... by default) if the cluster 
> has been configured with a restrictive umask, e.g.: umask 077. Job failed due 
> to the following reason:
> Path /hadoop/yarn/local/usercache/ambari-qa/appcache/application_ has 
> permission 700 but needs permission 750



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5457) Refactor DistributedScheduling framework to pull out common functionality

2016-08-06 Thread Arun Suresh (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun Suresh updated YARN-5457:
--
Attachment: YARN-5457.002.patch

Updating patch to fix javadocs, checkstyles (some of them) and the unit tests.

> Refactor DistributedScheduling framework to pull out common functionality
> -
>
> Key: YARN-5457
> URL: https://issues.apache.org/jira/browse/YARN-5457
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-5457.001.patch, YARN-5457.002.patch
>
>
> Opening this JIRA to track the some refactoring missed in YARN-5113:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5450) Enhance logging for Cluster.java around InetSocketAddress

2016-08-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410641#comment-15410641
 ] 

Varun Saxena edited comment on YARN-5450 at 8/6/16 3:55 PM:


[~vrushalic], thanks for the patch.
Will move this JIRA to Mapreduce project as the change is strictly in MapReduce 
project.

Few comments:
* In my opinion, there is no need to check LOG#isInfoEnabled because logs are 
typically at INFO level anyway. Our codebase if full of logging at INFO level 
without the check.
* I am not 100% sure of the use case but I think we should move the log 
statement above the check for clientprotocolprovider being null because 
exception is thrown there. And this log will be more useful if a  
clientprotocolprovider implementation cannot be picked. Maybe move this log 
above the for loop ?
{code}
138 if (null == clientProtocolProvider || null == client) {
139   throw initEx;
140 }
141 if (LOG.isInfoEnabled() && jobTrackAddr != null) {
142   LOG.info("Initialized Cluster for source=" + 
jobTrackAddr.toString());
143 }
{code}

* Change {{LOG.info("Initializing Cluster for source=}} to 
{{LOG.info("Initializing Cluster for job tracker }} ?


was (Author: varun_saxena):
[~vrushalic], thanks for the patch.
Will move this JIRA to Mapreduce project as the change is strictly in MapReduce 
project.

Few comments:
# In my opinion, there is no need to check LOG#isInfoEnabled because logs are 
typically at INFO level anyway. Our codebase if full of logging at INFO level 
without the check.
# I am not 100% sure of the use case but I think we should move the log 
statement above the check for clientprotocolprovider being null because 
exception is thrown there. And this log will be more useful if a  
clientprotocolprovider implementation cannot be picked. Maybe move this log 
above the for loop ?
{code}
138 if (null == clientProtocolProvider || null == client) {
139   throw initEx;
140 }
141 if (LOG.isInfoEnabled() && jobTrackAddr != null) {
142   LOG.info("Initialized Cluster for source=" + 
jobTrackAddr.toString());
143 }
{code}

  3. Change {{LOG.info("Initializing Cluster for source=}} to 
{{LOG.info("Initializing Cluster for job tracker }} ?

> Enhance logging for Cluster.java around InetSocketAddress
> -
>
> Key: YARN-5450
> URL: https://issues.apache.org/jira/browse/YARN-5450
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: sarun singla
>Assignee: Vrushali C
>Priority: Minor
>  Labels: YARN
> Attachments: YARN-5450.01.patch
>
>
> We need to add more logging for cluster.java class around " 
> initialize(InetSocketAddress jobTrackAddr, Configuration conf) " method to 
> give better logging like about the source of the property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5450) Enhance logging for Cluster.java around InetSocketAddress

2016-08-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410641#comment-15410641
 ] 

Varun Saxena commented on YARN-5450:


[~vrushalic], thanks for the patch.
Will move this JIRA to Mapreduce project as the change is strictly in MapReduce 
project.

A couple of comments:
# In my opinion, there is no need to check LOG#isInfoEnabled because logs are 
typically at INFO level anyway. Our codebase if full of logging at INFO level 
without the check.
# I am not 100% sure of the use case but I think we should move the log 
statement above the check for clientprotocolprovider being null because 
exception is thrown there. And this log will be more useful if a  
clientprotocolprovider implementation cannot be picked. Maybe move this log 
above the for loop ?
{code}
138 if (null == clientProtocolProvider || null == client) {
139   throw initEx;
140 }
141 if (LOG.isInfoEnabled() && jobTrackAddr != null) {
142   LOG.info("Initialized Cluster for source=" + 
jobTrackAddr.toString());
143 }
{code}

3. Change {{LOG.info("Initializing Cluster for source=}} to 
{{LOG.info("Initializing Cluster for job tracker }} ?

> Enhance logging for Cluster.java around InetSocketAddress
> -
>
> Key: YARN-5450
> URL: https://issues.apache.org/jira/browse/YARN-5450
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: sarun singla
>Assignee: Vrushali C
>Priority: Minor
>  Labels: YARN
> Attachments: YARN-5450.01.patch
>
>
> We need to add more logging for cluster.java class around " 
> initialize(InetSocketAddress jobTrackAddr, Configuration conf) " method to 
> give better logging like about the source of the property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5450) Enhance logging for Cluster.java around InetSocketAddress

2016-08-06 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15410641#comment-15410641
 ] 

Varun Saxena edited comment on YARN-5450 at 8/6/16 3:55 PM:


[~vrushalic], thanks for the patch.
Will move this JIRA to Mapreduce project as the change is strictly in MapReduce 
project.

Few comments:
# In my opinion, there is no need to check LOG#isInfoEnabled because logs are 
typically at INFO level anyway. Our codebase if full of logging at INFO level 
without the check.
# I am not 100% sure of the use case but I think we should move the log 
statement above the check for clientprotocolprovider being null because 
exception is thrown there. And this log will be more useful if a  
clientprotocolprovider implementation cannot be picked. Maybe move this log 
above the for loop ?
{code}
138 if (null == clientProtocolProvider || null == client) {
139   throw initEx;
140 }
141 if (LOG.isInfoEnabled() && jobTrackAddr != null) {
142   LOG.info("Initialized Cluster for source=" + 
jobTrackAddr.toString());
143 }
{code}

  3. Change {{LOG.info("Initializing Cluster for source=}} to 
{{LOG.info("Initializing Cluster for job tracker }} ?


was (Author: varun_saxena):
[~vrushalic], thanks for the patch.
Will move this JIRA to Mapreduce project as the change is strictly in MapReduce 
project.

A couple of comments:
# In my opinion, there is no need to check LOG#isInfoEnabled because logs are 
typically at INFO level anyway. Our codebase if full of logging at INFO level 
without the check.
# I am not 100% sure of the use case but I think we should move the log 
statement above the check for clientprotocolprovider being null because 
exception is thrown there. And this log will be more useful if a  
clientprotocolprovider implementation cannot be picked. Maybe move this log 
above the for loop ?
{code}
138 if (null == clientProtocolProvider || null == client) {
139   throw initEx;
140 }
141 if (LOG.isInfoEnabled() && jobTrackAddr != null) {
142   LOG.info("Initialized Cluster for source=" + 
jobTrackAddr.toString());
143 }
{code}

3. Change {{LOG.info("Initializing Cluster for source=}} to 
{{LOG.info("Initializing Cluster for job tracker }} ?

> Enhance logging for Cluster.java around InetSocketAddress
> -
>
> Key: YARN-5450
> URL: https://issues.apache.org/jira/browse/YARN-5450
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: yarn
>Reporter: sarun singla
>Assignee: Vrushali C
>Priority: Minor
>  Labels: YARN
> Attachments: YARN-5450.01.patch
>
>
> We need to add more logging for cluster.java class around " 
> initialize(InetSocketAddress jobTrackAddr, Configuration conf) " method to 
> give better logging like about the source of the property.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org