[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773779#comment-16773779
 ] 

Hadoop QA commented on YARN-9258:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
37s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 10s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 23s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m  
6s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
22s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9258 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959560/YARN-9258-004.patch |
| Optional Tests |  dupname  asflicense  compile  ja

[jira] [Commented] (YARN-8589) ATS TimelineACLsManager checkAccess is slow

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773758#comment-16773758
 ] 

Prabhu Joseph commented on YARN-8589:
-

[~Rakesh_Shah] The getEntities api (Tez UI) will get set of entities matching 
the query params and the ones which requested user has access. The getEntities 
will be slow when checkAccess has to happen for every entity. Tez UI to view 
list of apps and app details will be slower due to this. 

Simple testcase which does n putEntities and do getEntities with acl enabled, 
getEntities will be very slow comparing with acl not enabled or for admin user. 
We can test with MapReduce also - mapreduce jobs , RM putEntities and use below 
api to getEntities.

curl --negotiate -u :  
"http://prabhuzeppelin3.openstacklocal:8188/ws/v1/timeline/entities";

> ATS TimelineACLsManager checkAccess is slow
> ---
>
> Key: YARN-8589
> URL: https://issues.apache.org/jira/browse/YARN-8589
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>
> ATS rest api is very slow when there are more than 1lakh entries if 
> yarn.acl.enable is set to true as TimelineACLsManager has to check access for 
> every entries. We can;t disable yarn.acl.enable as all the YARN ACLs uses the 
> same config. We can have a separate config to provide read access to the ATS 
> Entries.
> {code}
> curl  http://:8188/ws/v1/timeline/HIVE_QUERY_ID
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773748#comment-16773748
 ] 

Prabhu Joseph commented on YARN-9315:
-

[~cheersyang] Can you review this patch - which fixes 
TestCapacitySchedulerMetrics failing intermittently as sometime assert happens 
before allocate. Failed testcases are not related and runs fine on local.

> TestCapacitySchedulerMetrics fails intermittently
> -
>
> Key: YARN-9315
> URL: https://issues.apache.org/jira/browse/YARN-9315
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9315-001.patch, YARN-9315-002.patch, 
> YARN-9315-002.patch
>
>
> TestCapacitySchedulerMetrics fails intermittently as assert check happens 
> before the allocate completes - observed in YARN-8132
> {code}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.177 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics
> [ERROR] 
> testCSMetrics(org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics)
>   Time elapsed: 3.11 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics.testCSMetrics(TestCapacitySchedulerMetrics.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773737#comment-16773737
 ] 

Prabhu Joseph commented on YARN-9317:
-

Thanks [~bibinchundatt] for the review. Attached V2 patch with changes. The 
test case failures are not related and runs fine in local.

> DefaultAMSProcessor#allocate  timelineServiceV2Enabled check is costly
> --
>
> Key: YARN-9317
> URL: https://issues.apache.org/jira/browse/YARN-9317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9317-001.patch, YARN-9317-002.patch
>
>
> {code}
> if (YarnConfiguration.timelineServiceV2Enabled(
>  getRmContext().getYarnConfiguration())) 
> {code}
> DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9317:

Attachment: YARN-9317-002.patch

> DefaultAMSProcessor#allocate  timelineServiceV2Enabled check is costly
> --
>
> Key: YARN-9317
> URL: https://issues.apache.org/jira/browse/YARN-9317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9317-001.patch, YARN-9317-002.patch
>
>
> {code}
> if (YarnConfiguration.timelineServiceV2Enabled(
>  getRmContext().getYarnConfiguration())) 
> {code}
> DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773719#comment-16773719
 ] 

Bibin A Chundatt commented on YARN-9317:


Thank you [~Prabhu Joseph] for patch

{code}
163   private boolean timelineServiceEnabled;
{code}
Rename all  timelineServiceEnabled to  timelineServiceV2Enabled.

> DefaultAMSProcessor#allocate  timelineServiceV2Enabled check is costly
> --
>
> Key: YARN-9317
> URL: https://issues.apache.org/jira/browse/YARN-9317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9317-001.patch
>
>
> {code}
> if (YarnConfiguration.timelineServiceV2Enabled(
>  getRmContext().getYarnConfiguration())) 
> {code}
> DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2019-02-20 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773711#comment-16773711
 ] 

Bibin A Chundatt commented on YARN-5933:


[~Prabhu Joseph] Please close the jira if no changes required..

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 5
> 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 172.26.71.120:37699 Call#26 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1479897867169_0005' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.g

[jira] [Resolved] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved YARN-5933.
-
Resolution: Fixed

YARN-8201 fixes this issue. 
yarn.timeline-service.entity-group-fs-store.unknown-active-seconds at ATS can 
be reduced to an hour to workaround.

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 5
> 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 172.26.71.120:37699 Call#26 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1479897867169_0005' doesn't exist in RM.
>   at 
> org.apache.hadoop.yar

[jira] [Commented] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-20 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773708#comment-16773708
 ] 

Wilfred Spiegelenburg commented on YARN-9298:
-

Thank you for the review [~yufeigu] it took a bit longer than expected working 
on 4 and 5 without polluting the code too much.
1) done added to all files changed
2) added tests for:
* FairQueuePlacementUtils
* PlacementFactory
* PlacementRule (FS added parts)
3) removed the extra line
4) That is how I started the implementation. I ran into a number of problems 
while instantiating the rules in the policy and then moved to this model. I 
have it working now without polluting the factory and or rule with lots of FS 
specific classes.
5) Done that as part of the rewrite for 4)
6) updated the javadoc for the method
7) fixed
8) removed, the exception is already logged higher up in the stack


> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9298) Implement FS placement rules using PlacementRule interface

2019-02-20 Thread Wilfred Spiegelenburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated YARN-9298:

Attachment: YARN-9298.002.patch

> Implement FS placement rules using PlacementRule interface
> --
>
> Key: YARN-9298
> URL: https://issues.apache.org/jira/browse/YARN-9298
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-9298.001.patch, YARN-9298.002.patch
>
>
> Implement existing placement rules of the FS using the PlacementRule 
> interface.
> Preparation for YARN-8967



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8589) ATS TimelineACLsManager checkAccess is slow

2019-02-20 Thread Rakesh Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773707#comment-16773707
 ] 

Rakesh Shah commented on YARN-8589:
---

Hi [~Prabhu Joseph],

Can you please elaborate the issue and can i test it with mapreduce?

 

> ATS TimelineACLsManager checkAccess is slow
> ---
>
> Key: YARN-8589
> URL: https://issues.apache.org/jira/browse/YARN-8589
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Priority: Major
>
> ATS rest api is very slow when there are more than 1lakh entries if 
> yarn.acl.enable is set to true as TimelineACLsManager has to check access for 
> every entries. We can;t disable yarn.acl.enable as all the YARN ACLs uses the 
> same config. We can have a separate config to provide read access to the ATS 
> Entries.
> {code}
> curl  http://:8188/ws/v1/timeline/HIVE_QUERY_ID
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9258:

Attachment: YARN-9258-004.patch

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch, 
> YARN-9258-003.patch, YARN-9258-004.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9321) Document Distributed Shell examples in YARN Node Attributes Section

2019-02-20 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9321:
---

 Summary: Document Distributed Shell examples in YARN Node 
Attributes Section
 Key: YARN-9321
 URL: https://issues.apache.org/jira/browse/YARN-9321
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Document Distributed Shell examples in YARN Node Attributes Section - follow-up 
from YARN-9258.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773690#comment-16773690
 ] 

Prabhu Joseph commented on YARN-9258:
-

[~cheersyang] Attached v4 patch with {{PlacementConstraints.md}} modified. Will 
create a doc Jira for Node Attributes Section. Thanks.

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch, 
> YARN-9258-003.patch, YARN-9258-004.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773682#comment-16773682
 ] 

Weiwei Yang commented on YARN-9258:
---

Hi [~Prabhu Joseph]

It looks almost good. But for the document, I think it needs some refinement. I 
suggested to modify as following,
{noformat}
PlacementSpec  => "" | KeyVal;PlacementSpec
KeyVal => SourceTag,ConstraintExpr
SourceTag  => String(NumContainers)
ConstraintExpr => SingleConstraint | CompositeConstraint
SingleConstraint   => "IN",Scope,TargetTag | "NOTIN",Scope,TargetTag |  
  "CARDINALITY",Scope,TargetTag,MinCard,MaxCard |  
NodeAttributeConstraintExpr
NodeAttributeConstraintExpr => NodeAttributeName=Value, NodeAttributeName!=Value
CompositeConstraint => AND(ConstraintList) | OR(ConstraintList)
ConstraintList  => Constraint | Constraint:ConstraintList
NumContainers   => int
Scope   => "NODE" | "RACK"
TargetTag   => String
MinCard => int
MaxCard => int
{noformat}
the main difference to your patch is we don't list {{NodeAttributeConstraint}} 
in {{ConstraintExpr}}, because that is type of \{{SingleConstraint}} actually. 
And slightly modified \{{SourceTag}} format.

For the node attribute, maybe we can have another Jira to add some examples run 
from distributed shell in \{{NodeAttributes.md}}. So in 
\{{PlacementConstraints.md}}, we can simply add a link to that. Would make more 
sense for people to read.

Thanks

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch, 
> YARN-9258-003.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773681#comment-16773681
 ] 

Prabhu Joseph commented on YARN-5933:
-

[~bibinchundatt] Yes i think YARN-8201 fixes this issue along with reducing 
yarn.timeline-service.entity-group-fs-store.unknown-active-seconds at ATS.

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 5
> 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 172.26.71.120:37699 Call#26 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1479897867169_0005' doesn't exist in

[jira] [Commented] (YARN-5933) ATS stale entries in active directory causes ApplicationNotFoundException in RM

2019-02-20 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773670#comment-16773670
 ] 

Bibin A Chundatt commented on YARN-5933:


[~Prabhu Joseph]

YARN-8201 solves the log flooding issue rt ??

> ATS stale entries in active directory causes ApplicationNotFoundException in 
> RM
> ---
>
> Key: YARN-5933
> URL: https://issues.apache.org/jira/browse/YARN-5933
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> On Secure cluster where ATS is down, Tez job submitted will fail while 
> getting TIMELINE_DELEGATION_TOKEN with below exception
> {code}
> 0: jdbc:hive2://kerberos-2.openstacklocal:100> select csmallint from 
> alltypesorc group by csmallint;
> INFO  : Session is already open
> INFO  : Dag name: select csmallint from alltypesor...csmallint(Stage-1)
> INFO  : Tez session was closed. Reopening...
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Failed to connect to timeline server. Connection 
> retries limit exceeded. The posted timeline event may be missing
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$TimelineClientConnectionRetry.retryOn(TimelineClientImpl.java:266)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.operateDelegationToken(TimelineClientImpl.java:590)
>   at 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.getDelegationToken(TimelineClientImpl.java:506)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getTimelineDelegationToken(YarnClientImpl.java:349)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.addTimelineDelegationToken(YarnClientImpl.java:330)
>   at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:250)
>   at 
> org.apache.tez.client.TezYarnClient.submitApplication(TezYarnClient.java:72)
>   at org.apache.tez.client.TezClient.start(TezClient.java:409)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.closeAndOpen(TezSessionPoolManager.java:311)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:453)
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:180)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1728)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1485)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1262)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1126)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1121)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Tez YarnClient has received an applicationID from RM. On Restarting ATS now, 
> ATS tries to get the application report from RM and so RM will throw 
> ApplicationNotFoundException. ATS will keep on requesting and which floods RM.
> {code}
> RM logs:
> 2016-11-23 13:53:57,345 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new 
> applicationId: 5
> 2016-11-23 14:05:04,936 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from 172.26.71.120:37699 Call#26 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1479897867169_0005' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.g

[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773613#comment-16773613
 ] 

Hadoop QA commented on YARN-7129:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 16 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m  
6s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
20s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 53s{color} | {color:orange} root: The patch generated 9 new + 4 unchanged - 
0 fixed = 13 total (was 4) {color} |
| {color:green}+1{color} | {color:green} hadolint {color} | {color:green}  0m  
0s{color} | {color:green} There were no new hadolint issues. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 13m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:orange}-0{color} | {color:orange} shelldocs {color} | {color:orange}  
0m 11s{color} | {color:orange} The patch generated 136 new + 104 unchanged - 0 
fixed = 240 total (was 104) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m 
14s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 51s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-catalog/hadoop-yarn-applications-catalog-docker
 {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
43s{color} | {color:green} the patch passed {color} |
|| 

[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773604#comment-16773604
 ] 

Hadoop QA commented on YARN-999:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
44s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 4 new + 368 unchanged - 15 fixed = 372 total (was 383) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 29s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
43s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 15s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
43s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}165m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959517/YARN-999.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux cd98d5aa27c8 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 371a6db |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-

[jira] [Commented] (YARN-9137) Get the IP and port of the docker container and display it on WEB UI2

2019-02-20 Thread Xun Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773577#comment-16773577
 ] 

Xun Liu commented on YARN-9137:
---

[~eyang],Ok, let me finish this work. :D

> Get the IP and port of the docker container and display it on WEB UI2
> -
>
> Key: YARN-9137
> URL: https://issues.apache.org/jira/browse/YARN-9137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xun Liu
>Assignee: Xun Liu
>Priority: Major
>
> 1) When using a container network such as Calico, the IP of the container is 
> not the IP of the host, but is allocated in the private network, and the 
> different containers can be directly connected.
>  Exposing the services in the container through a reverse proxy such as Ngxin 
> makes it easy for users to view the IP and port on WEB UI2 to use the 
> services in the container, such as Tomcat, TensorBoard, and so on.
>  2) When not using a container network such as Calico, the container also has 
> its own container port.
> So you need to display the IP and port of the docker container on WEB UI2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-02-20 Thread Zhankun Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773570#comment-16773570
 ] 

Zhankun Tang commented on YARN-9060:


[~jojochuang] , Thanks for reporting this. I'll check it ASAP.

> [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin 
> as an example
> --
>
> Key: YARN-9060
> URL: https://issues.apache.org/jira/browse/YARN-9060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, 
> YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, 
> YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, 
> YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, 
> YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, 
> YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, 
> YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, 
> YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, 
> YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.
> This Jira will have three parts:
>  # native c-e module
>  # Java layer code to isolate devices for container (docker and non-docker)
>  # A sample Nvidia GPU plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9137) Get the IP and port of the docker container and display it on WEB UI2

2019-02-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773515#comment-16773515
 ] 

Eric Yang commented on YARN-9137:
-

[~liuxun323] sorry for the late reply.  I think this is a good feature to have. 
 You are welcome to contribute.

> Get the IP and port of the docker container and display it on WEB UI2
> -
>
> Key: YARN-9137
> URL: https://issues.apache.org/jira/browse/YARN-9137
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xun Liu
>Assignee: Xun Liu
>Priority: Major
>
> 1) When using a container network such as Calico, the IP of the container is 
> not the IP of the host, but is allocated in the private network, and the 
> different containers can be directly connected.
>  Exposing the services in the container through a reverse proxy such as Ngxin 
> makes it easy for users to view the IP and port on WEB UI2 to use the 
> services in the container, such as Tomcat, TensorBoard, and so on.
>  2) When not using a container network such as Calico, the container also has 
> its own container port.
> So you need to display the IP and port of the docker container on WEB UI2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3554) Default value for maximum nodemanager connect wait time is too high

2019-02-20 Thread Rayman (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773513#comment-16773513
 ] 

Rayman commented on YARN-3554:
--

The RetryUpToMaximumTimeWithFixedSleep policy takes as input a maxTime and a 
sleepTime.
and internally is implemented as a RetryUpToMaximumCountWithFixedSleep with 
maxCount =  maxTime / sleepTime. 

This has a problem
It does not account for the time spent while performing the actual retry. For 
example, 
RetryUpToMaximumTimeWithFixedSleep with maxTime = 30 sec and sleepTime = 1sec. 
Will takeupto 90 seconds, if each retry (e.g., ConnectionTimeout) takes 2 
seconds to return. 
30 * (2 +1). 

A policy claiming to be RetryUpToMaximumTimeWithFixedSleep, should *actually* 
respect the *maximum time*, e.g., by recording a timestamp/timer.

> Default value for maximum nodemanager connect wait time is too high
> ---
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Naganarasimha G R
>Priority: Major
>  Labels: BB2015-05-RFC, newbie
> Fix For: 2.8.0, 2.7.1, 2.6.2, 3.0.0-alpha1
>
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 90 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-20 Thread JIRA


[ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773507#comment-16773507
 ] 

Íñigo Goiri commented on YARN-999:
--

Based on feedback from [~curino], I made to initially trigger preemption 
(notify the AM) and after the time out passes, actually kill the container.

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: YARN-291.000.patch, YARN-999.001.patch, 
> YARN-999.002.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.

2019-02-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated YARN-999:
-
Attachment: YARN-999.002.patch

> In case of long running tasks, reduce node resource should balloon out 
> resource quickly by calling preemption API and suspending running task. 
> ---
>
> Key: YARN-999
> URL: https://issues.apache.org/jira/browse/YARN-999
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, nodemanager, scheduler
>Reporter: Junping Du
>Assignee: Íñigo Goiri
>Priority: Major
> Attachments: YARN-291.000.patch, YARN-999.001.patch, 
> YARN-999.002.patch
>
>
> In current design and implementation, when we decrease resource on node to 
> less than resource consumption of current running tasks, tasks can still be 
> running until the end. But just no new task get assigned on this node 
> (because AvailableResource < 0) until some tasks are finished and 
> AvailableResource > 0 again. This is good for most cases but in case of long 
> running task, it could be too slow for resource setting to actually work so 
> preemption could be hired here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773431#comment-16773431
 ] 

Eric Yang commented on YARN-7129:
-

Patch 26 rebased to current trunk.

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch, 
> YARN-7129.026.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: YARN-7129.026.patch

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch, 
> YARN-7129.026.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated YARN-9320:
---
Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large so we expect node failures/restart 
frequently; I see this happens a couple of times (so it's not really "fatal") 
among a bunch of audit logging for "OPERATION=replaceLabelsOnNode" calls
{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures/restart frequently; I see this happens a couple of times (so it's not 
really "fatal") among a bunch of audit logging for 
"OPERATION=replaceLabelsOnNode" calls
{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.3
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the 
> top of my head what version it corresponds to. I can look it up if that's 
> important, but I haven't found a bug like this so I suspect it would also 
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large so we expect node failures/restart 
> frequently; I see this happens a couple of times (so it's not really "fatal") 
> among a bunch of audit logging for "OPERATION=replaceLabelsOnNode" calls
> {noformat}
> 2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity

[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated YARN-9320:
---
Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures/restart frequently; I see this happens a couple of times (so it's not 
really "fatal") among a bunch of audit logging for 
"OPERATION=replaceLabelsOnNode" calls
{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures/restart frequently; also some apps may have misconfigured node labels 
specified so node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.3
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the 
> top of my head what version it corresponds to. I can look it up if that's 
> important, but I haven't found a bug like this so I suspect it would also 
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node 
> failures/restart frequently; I see this happens a couple of times (so it's 
> not really "fatal") among a bunch of audit logging for 
> "OPERATION=replaceLabelsOnNode" calls
> {noformat}
> 2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Pro

[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated YARN-9320:
---
Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures frequently; also some apps may have misconfigured node labels 
specified so node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures frequently; also some apps may have misconfigured node labels 
specified spo node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.3
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the 
> top of my head what version it corresponds to. I can look it up if that's 
> important, but I haven't found a bug like this so I suspect it would also 
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node 
> failures frequently; also some apps may have misconfigured node labels 
> specified so node label related stuff may go into corner cases. Still, this 
> shouldn't happen based on a user-supplied parameter.
> {noformat}
> 2019-02

[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated YARN-9320:
---
Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures/restart frequently; also some apps may have misconfigured node labels 
specified so node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures frequently; also some apps may have misconfigured node labels 
specified so node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.3
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the 
> top of my head what version it corresponds to. I can look it up if that's 
> important, but I haven't found a bug like this so I suspect it would also 
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node 
> failures/restart frequently; also some apps may have misconfigured node 
> labels specified so node label related stuff may go into corner cases. Still, 
> this shouldn't happen based on a user-supplied parameter.
> {nofo

[jira] [Updated] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated YARN-9320:
---
Description: 
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

If it helps, the cluster is very large (1000s of NMs) so we expect node 
failures frequently; also some apps may have misconfigured node labels 
specified spo node label related stuff may go into corner cases. Still, this 
shouldn't happen based on a user-supplied parameter.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}

  was:
We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}


> ConcurrentModificationException in capacity scheduler (updateQueueStatistics)
> -
>
> Key: YARN-9320
> URL: https://issues.apache.org/jira/browse/YARN-9320
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.3
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the 
> top of my head what version it corresponds to. I can look it up if that's 
> important, but I haven't found a bug like this so I suspect it would also 
> affect a current version unless fixed by accident.
> If it helps, the cluster is very large (1000s of NMs) so we expect node 
> failures frequently; also some apps may have misconfigured node labels 
> specified spo node label related stuff may go into corner cases. Still, this 
> shouldn't happen based on a user-supplied parameter.
> {noformat}
> 2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils:
>  queueCapacities.getNodePartitionsSet() changed 
> java.util.ConcurrentModificationException
>   at java.util.HashMap$Has

[jira] [Created] (YARN-9320) ConcurrentModificationException in capacity scheduler (updateQueueStatistics)

2019-02-20 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created YARN-9320:
--

 Summary: ConcurrentModificationException in capacity scheduler 
(updateQueueStatistics)
 Key: YARN-9320
 URL: https://issues.apache.org/jira/browse/YARN-9320
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.9.3
Reporter: Sergey Shelukhin


We are running a snapshot of 2.9 branch, unfortunately I'm not sure off the top 
of my head what version it corresponds to. I can look it up if that's 
important, but I haven't found a bug like this so I suspect it would also 
affect a current version unless fixed by accident.

{noformat}
2019-02-20 13:12:52,785 FATAL [SchedulerEventDispatcher:Event Processor] 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils: 
queueCapacities.getNodePartitionsSet() changed 
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CSQueueUtils.updateQueueStatistics(CSQueueUtils.java:303)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.updateClusterResource(LeafQueue.java:1879)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.updateClusterResource(ParentQueue.java:897)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updateNodeLabelsAndQueueResource(CapacityScheduler.java:1775)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1633)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:154)
at 
org.apache.hadoop.yarn.event.EventDispatcher$EventProcessor.run(EventDispatcher.java:67)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773398#comment-16773398
 ] 

Hadoop QA commented on YARN-9317:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 22s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  1s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 54s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
29s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}171m 50s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.TestResourceTrackerService |
|   | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9317 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959477/YARN-9317-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux daf1056abd63 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Pe

[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773349#comment-16773349
 ] 

Hadoop QA commented on YARN-7129:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} YARN-7129 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-7129 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959492/YARN-7129.025.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23463/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773348#comment-16773348
 ] 

Hadoop QA commented on YARN-9315:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 54s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 11s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerAutoCreatedQueuePreemption
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9315 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959471/YARN-9315-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 82c08799d0e5 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / aa3ad36 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23460/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23460/testReport/ |
| Max. process+thread count | 867 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-reso

[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: YARN-7129.025.patch

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773340#comment-16773340
 ] 

Eric Yang commented on YARN-7129:
-

[~billie.rinaldi] Patch 25 updated javascript unit test framework Karma version 
and apidoc version to remove some warnings about using older packages.  Some 
additional logic to ensure the unit test framework does not bundle into web 
application archive.  The docker image name is renamed to 
apache/hadoop-yarn-applications-catalog-docker to match maven project name.

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: (was: YARN-7129.025.patch)

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-20 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773299#comment-16773299
 ] 

Yufei Gu edited comment on YARN-9278 at 2/20/19 7:47 PM:
-

Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 
The idea of setting a node # threshhold doesn't look elegant, but reasonable if 
we can't change the iteration-and-check way to identify preemptable containers. 
It may not be the only idea though.

Without introduce more complexity to FS preemption, it is already very 
complicated, there are some workarounds you can try: To increase FairShare 
Preemption Timeout and FairShare Preemption Threshold to reduce the chance of 
preemption. This is specially useful for a large cluster, since there is more 
chance to get resources just by waiting. 



was (Author: yufeigu):
Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 

I would suggest lower 
{{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let 
preemption kick in earlier for a large cluster. The default value is 80%, which 
means preemption won't kick in until 80% resources of the whole cluster have 
been used. Please be aware that low utilization threshold may cause an 
unnecessary container churn, so you don't want it to be too low. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7129) Application Catalog for YARN applications

2019-02-20 Thread Eric Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated YARN-7129:

Attachment: YARN-7129.025.patch

> Application Catalog for YARN applications
> -
>
> Key: YARN-7129
> URL: https://issues.apache.org/jira/browse/YARN-7129
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: applications
>Reporter: Eric Yang
>Assignee: Eric Yang
>Priority: Major
> Attachments: YARN Appstore.pdf, YARN-7129.001.patch, 
> YARN-7129.002.patch, YARN-7129.003.patch, YARN-7129.004.patch, 
> YARN-7129.005.patch, YARN-7129.006.patch, YARN-7129.007.patch, 
> YARN-7129.008.patch, YARN-7129.009.patch, YARN-7129.010.patch, 
> YARN-7129.011.patch, YARN-7129.012.patch, YARN-7129.013.patch, 
> YARN-7129.014.patch, YARN-7129.015.patch, YARN-7129.016.patch, 
> YARN-7129.017.patch, YARN-7129.018.patch, YARN-7129.019.patch, 
> YARN-7129.020.patch, YARN-7129.021.patch, YARN-7129.022.patch, 
> YARN-7129.023.patch, YARN-7129.024.patch, YARN-7129.025.patch
>
>
> YARN native services provides web services API to improve usability of 
> application deployment on Hadoop using collection of docker images.  It would 
> be nice to have an application catalog system which provides an editorial and 
> search interface for YARN applications.  This improves usability of YARN for 
> manage the life cycle of applications.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773317#comment-16773317
 ] 

Hadoop QA commented on YARN-9315:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 57s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 54s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
26s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 17s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9315 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959470/YARN-9315-002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a04e20688ef8 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / aa3ad36 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23459/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23459/testReport/ |
| Max. process+thread count | 946 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 U: 
h

[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-02-20 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773319#comment-16773319
 ] 

Wei-Chiu Chuang commented on YARN-9060:
---

i am not sure why but the code fails to compile after this commit. Please see 
YARN-9319 for details.

> [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin 
> as an example
> --
>
> Key: YARN-9060
> URL: https://issues.apache.org/jira/browse/YARN-9060
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-9060-trunk.001.patch, YARN-9060-trunk.002.patch, 
> YARN-9060-trunk.003.patch, YARN-9060-trunk.004.patch, 
> YARN-9060-trunk.005.patch, YARN-9060-trunk.006.patch, 
> YARN-9060-trunk.007.patch, YARN-9060-trunk.008.patch, 
> YARN-9060-trunk.009.patch, YARN-9060-trunk.010.patch, 
> YARN-9060-trunk.011.patch, YARN-9060-trunk.012.patch, 
> YARN-9060-trunk.013.patch, YARN-9060-trunk.014.patch, 
> YARN-9060-trunk.015.patch, YARN-9060-trunk.016.patch, 
> YARN-9060-trunk.017.patch, YARN-9060-trunk.018.patch
>
>
> Due to the cgroups v1 implementation policy in linux kernel, we cannot update 
> the value of the device cgroups controller unless we have the root permission 
> ([here|https://github.com/torvalds/linux/blob/6f0d349d922ba44e4348a17a78ea51b7135965b1/security/device_cgroup.c#L604]).
>  So we need to support this in container-executor for Java layer to invoke.
> This Jira will have three parts:
>  # native c-e module
>  # Java layer code to isolate devices for container (docker and non-docker)
>  # A sample Nvidia GPU plugin



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9278) Shuffle nodes when selecting to be preempted nodes

2019-02-20 Thread Yufei Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773299#comment-16773299
 ] 

Yufei Gu commented on YARN-9278:


Hi [~uranus], this seems a perf issue for a busy large cluster due to the 
preemption implementation, which is iteration and check. 

I would suggest lower 
{{yarn.scheduler.fair.preemption.cluster-utilization-threshold}} to let 
preemption kick in earlier for a large cluster. The default value is 80%, which 
means preemption won't kick in until 80% resources of the whole cluster have 
been used. Please be aware that low utilization threshold may cause an 
unnecessary container churn, so you don't want it to be too low. 

> Shuffle nodes when selecting to be preempted nodes
> --
>
> Key: YARN-9278
> URL: https://issues.apache.org/jira/browse/YARN-9278
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Zhaohui Xin
>Assignee: Zhaohui Xin
>Priority: Major
>
> We should *shuffle* the nodes to avoid some nodes being preempted frequently. 
> Also, we should *limit* the num of nodes to make preemption more efficient.
> Just like this,
> {code:java}
> // we should not iterate all nodes, that will be very slow
> long maxTryNodeNum = 
> context.getPreemptionConfig().getToBePreemptedNodeMaxNumOnce();
> if (potentialNodes.size() > maxTryNodeNum){
>   Collections.shuffle(potentialNodes);
>   List newPotentialNodes = new ArrayList();
> for (int i = 0; i < maxTryNodeNum; i++){
>   newPotentialNodes.add(potentialNodes.get(i));
> }
> potentialNodes = newPotentialNodes;
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7297) VM Load Aware Hadoop scheduler for cloud environemnt

2019-02-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/YARN-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned YARN-7297:
-

Assignee: (was: Íñigo Goiri)

> VM Load Aware Hadoop scheduler for cloud environemnt
> 
>
> Key: YARN-7297
> URL: https://issues.apache.org/jira/browse/YARN-7297
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api
>Reporter: Adepu Sree Lakshni
>Priority: Major
>
> Currently YARN runs containers in the servers assuming that they own all the 
> resources. The proposal is to use the utilization information in the node and 
> the containers to estimate how much is consumed by external processes and 
> schedule based on this estimation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5259) Add two metrics at FSOpDurations for doing container assign and completed Performance statistical analysis

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773295#comment-16773295
 ] 

Hadoop QA commented on YARN-5259:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} YARN-5259 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-5259 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12835686/YARN-5259-004.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23462/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Add two metrics at FSOpDurations for doing container assign and completed 
> Performance statistical analysis
> --
>
> Key: YARN-5259
> URL: https://issues.apache.org/jira/browse/YARN-5259
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Reporter: ChenFolin
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-5259-001.patch, YARN-5259-002.patch, 
> YARN-5259-003.patch, YARN-5259-004.patch
>
>
> If cluster is slow , we can not know Whether it is caused by container assign 
> or completed performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9319) YARN-9060 does not compile

2019-02-20 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created YARN-9319:
-

 Summary: YARN-9060 does not compile
 Key: YARN-9319
 URL: https://issues.apache.org/jira/browse/YARN-9319
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.3.0
 Environment:  RHEL 6.8, CMake 3.2.0, Java 8u151, gcc version 4.4.7 
20120313 (Red Hat 4.4.7-17) (GCC)
Reporter: Wei-Chiu Chuang


When I do: 

mvn clean install -DskipTests -Pdist,native  -Dmaven.javadoc.skip=true

It does not compile on my machine (RHEL 6.8, CMake 3.2.0, Java 8u151, gcc 
version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC))
{noformat}
[WARNING] [ 54%] Built target test-container-executor
[WARNING] Linking CXX static library libgtest.a
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -P 
CMakeFiles/gtest.dir/cmake_clean_target.cmake
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_link_script 
CMakeFiles/gtest.dir/link.txt --verbose=1
[WARNING] /usr/bin/ar cq libgtest.a  
CMakeFiles/gtest.dir/data/4/weichiu/hadoop/hadoop-common-project/hadoop-common/src/main/native/gtest/gtest-all.cc.o
[WARNING] /usr/bin/ranlib libgtest.a
[WARNING] make[2]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_progress_report 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/CMakeFiles
  26
[WARNING] [ 54%] Built target gtest
[WARNING] make[1]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] In file included from 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:27:
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/devices/devices-module.h:31:
 error: redefinition of typedef 'update_cgroups_parameters_function'
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/fpga/fpga-module.h:31:
 note: previous declaration of 'update_cgroups_parameters_function' was here
[WARNING] make[2]: *** 
[CMakeFiles/container-executor.dir/main/native/container-executor/impl/main.c.o]
 Error 1
[WARNING] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs
[WARNING] make: *** [all] Error 2
{noformat}
[~tangzhankun], [~sunilg] care to take a look?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9319) YARN-9060 does not compile

2019-02-20 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated YARN-9319:
--
Description: 
When I do: 

mvn clean install -DskipTests -Pdist,native  -Dmaven.javadoc.skip=true

It does not compile on my machine (RHEL 6.8, CMake 3.2.0, Java 8u151, gcc 
version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC))
{noformat}
[WARNING] [ 54%] Built target test-container-executor
[WARNING] Linking CXX static library libgtest.a
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -P 
CMakeFiles/gtest.dir/cmake_clean_target.cmake
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_link_script 
CMakeFiles/gtest.dir/link.txt --verbose=1
[WARNING] /usr/bin/ar cq libgtest.a  
CMakeFiles/gtest.dir/data/4/weichiu/hadoop/hadoop-common-project/hadoop-common/src/main/native/gtest/gtest-all.cc.o
[WARNING] /usr/bin/ranlib libgtest.a
[WARNING] make[2]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_progress_report 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/CMakeFiles
  26
[WARNING] [ 54%] Built target gtest
[WARNING] make[1]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] In file included from 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:27:
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/devices/devices-module.h:31:
 error: redefinition of typedef 'update_cgroups_parameters_function'
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/fpga/fpga-module.h:31:
 note: previous declaration of 'update_cgroups_parameters_function' was here
[WARNING] make[2]: *** 
[CMakeFiles/container-executor.dir/main/native/container-executor/impl/main.c.o]
 Error 1
[WARNING] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs
[WARNING] make: *** [all] Error 2
{noformat}
The code compiles once I revert YARN-9060.

[~tangzhankun], [~sunilg] care to take a look?

  was:
When I do: 

mvn clean install -DskipTests -Pdist,native  -Dmaven.javadoc.skip=true

It does not compile on my machine (RHEL 6.8, CMake 3.2.0, Java 8u151, gcc 
version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC))
{noformat}
[WARNING] [ 54%] Built target test-container-executor
[WARNING] Linking CXX static library libgtest.a
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -P 
CMakeFiles/gtest.dir/cmake_clean_target.cmake
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_link_script 
CMakeFiles/gtest.dir/link.txt --verbose=1
[WARNING] /usr/bin/ar cq libgtest.a  
CMakeFiles/gtest.dir/data/4/weichiu/hadoop/hadoop-common-project/hadoop-common/src/main/native/gtest/gtest-all.cc.o
[WARNING] /usr/bin/ranlib libgtest.a
[WARNING] make[2]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] /opt/toolchain/cmake-3.2.0/bin/cmake -E cmake_progress_report 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/CMakeFiles
  26
[WARNING] [ 54%] Built target gtest
[WARNING] make[1]: Leaving directory 
`/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native'
[WARNING] In file included from 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c:27:
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/devices/devices-module.h:31:
 error: redefinition of typedef 'update_cgroups_parameters_function'
[WARNING] 
/data/4/weichiu/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/fpga/fpga-module.h:31:
 note: previous declaration of 'update_cgroups_parameters_function' was here
[WARNING] make[2]: *** 
[CMakeFiles/container-executor.dir/main/native/container-executor/impl/main.c.o]
 Error 1
[WARNING] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
[WARNING] make[1]: *** Waiting for unfinished jobs
[WARNING] make: *** [all] Error 2
{noformat}
[~tangzhankun], [~sunilg] care to take a look?


> YARN-9060 does not compile
> --
>
>

[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773274#comment-16773274
 ] 

Hadoop QA commented on YARN-9265:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
38s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 45s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
35s{color} | {color:green} hadoop-yarn-project/hadoop-yarn: The patch generated 
0 new + 259 unchanged - 11 fixed = 259 total (was 270) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 27s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  3m 47s{color} 
| {color:red} hadoop-yarn-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m 
47s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}110m 41s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.yarn.client.api.impl.TestTimelineClientV2Impl |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9265 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959464/YARN-9265-007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  xml  |

[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773265#comment-16773265
 ] 

Hadoop QA commented on YARN-9258:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  9s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
14s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 16s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
48s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 20m  
0s{color} | {color:green} hadoop-yarn-applications-distributedshell in the 
patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9258 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959463/YARN-9258-003.patch |
| Optional Tests |  dupname  asflicense  compile  ja

[jira] [Updated] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9317:

Attachment: YARN-9317-001.patch

> DefaultAMSProcessor#allocate  timelineServiceV2Enabled check is costly
> --
>
> Key: YARN-9317
> URL: https://issues.apache.org/jira/browse/YARN-9317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9317-001.patch
>
>
> {code}
> if (YarnConfiguration.timelineServiceV2Enabled(
>  getRmContext().getYarnConfiguration())) 
> {code}
> DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9315:

Attachment: YARN-9315-002.patch

> TestCapacitySchedulerMetrics fails intermittently
> -
>
> Key: YARN-9315
> URL: https://issues.apache.org/jira/browse/YARN-9315
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9315-001.patch, YARN-9315-002.patch, 
> YARN-9315-002.patch
>
>
> TestCapacitySchedulerMetrics fails intermittently as assert check happens 
> before the allocate completes - observed in YARN-8132
> {code}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.177 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics
> [ERROR] 
> testCSMetrics(org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics)
>   Time elapsed: 3.11 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics.testCSMetrics(TestCapacitySchedulerMetrics.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9315:

Attachment: YARN-9315-002.patch

> TestCapacitySchedulerMetrics fails intermittently
> -
>
> Key: YARN-9315
> URL: https://issues.apache.org/jira/browse/YARN-9315
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9315-001.patch, YARN-9315-002.patch
>
>
> TestCapacitySchedulerMetrics fails intermittently as assert check happens 
> before the allocate completes - observed in YARN-8132
> {code}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.177 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics
> [ERROR] 
> testCSMetrics(org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics)
>   Time elapsed: 3.11 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics.testCSMetrics(TestCapacitySchedulerMetrics.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773199#comment-16773199
 ] 

Hadoop QA commented on YARN-9318:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
 6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 36s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  2s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
45s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
25s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9318 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959454/YARN-9318.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 2d851231d36a 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / aa3ad36 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/23456/testReport/ |
| Max. process+thread count | 469 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23456/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Resources#multiplyAndRoundUp does not consider Resource Types
> ---

[jira] [Commented] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-20 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773193#comment-16773193
 ] 

Peter Bacsko commented on YARN-9265:


I made a slight modification in {{FpgaDiscoverer.discover()}}, replaced the 
existing iterator-based logic with some nice streams/lambda logic.

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> 
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-9265-001.patch, YARN-9265-002.patch, 
> YARN-9265-003.patch, YARN-9265-004.patch, YARN-9265-005.patch, 
> YARN-9265-006.patch, YARN-9265-007.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> 
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   StatusInformation
>  
> pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20)
>   PCIe 08:00.0
>   FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> 
>  
> Call "aocl diagnose " to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f30
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f30}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-20 Thread Peter Bacsko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773193#comment-16773193
 ] 

Peter Bacsko edited comment on YARN-9265 at 2/20/19 4:56 PM:
-

I made a slight modification in {{FpgaDiscoverer.discover()}}, replaced the 
existing iterator-based logic with some nice streams/lambda.


was (Author: pbacsko):
I made a slight modification in {{FpgaDiscoverer.discover()}}, replaced the 
existing iterator-based logic with some nice streams/lambda logic.

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> 
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-9265-001.patch, YARN-9265-002.patch, 
> YARN-9265-003.patch, YARN-9265-004.patch, YARN-9265-005.patch, 
> YARN-9265-006.patch, YARN-9265-007.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> 
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   StatusInformation
>  
> pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20)
>   PCIe 08:00.0
>   FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> 
>  
> Call "aocl diagnose " to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f30
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f30}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8625) Aggregate Resource Allocation for each job is not present in ATS

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773191#comment-16773191
 ] 

Prabhu Joseph commented on YARN-8625:
-

[~rohithsharma] [~eepayne] Can you review this jira as well when you get time.

> Aggregate Resource Allocation for each job is not present in ATS
> 
>
> Key: YARN-8625
> URL: https://issues.apache.org/jira/browse/YARN-8625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2
>Affects Versions: 2.7.4
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: 0001-YARN-8625.patch, 0002-YARN-8625.patch
>
>
> Aggregate Resource Allocation shown on RM UI for finished job is very useful 
> metric to understand how much resource a job has consumed. But this does not 
> get stored in ATS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9265) FPGA plugin fails to recognize Intel Processing Accelerator Card

2019-02-20 Thread Peter Bacsko (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bacsko updated YARN-9265:
---
Attachment: YARN-9265-007.patch

> FPGA plugin fails to recognize Intel Processing Accelerator Card
> 
>
> Key: YARN-9265
> URL: https://issues.apache.org/jira/browse/YARN-9265
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.1.0
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Critical
> Attachments: YARN-9265-001.patch, YARN-9265-002.patch, 
> YARN-9265-003.patch, YARN-9265-004.patch, YARN-9265-005.patch, 
> YARN-9265-006.patch, YARN-9265-007.patch
>
>
> The plugin cannot autodetect Intel FPGA PAC (Processing Accelerator Card).
> There are two major issues.
> Problem #1
> The output of aocl diagnose:
> {noformat}
> 
> Device Name:
> acl0
>  
> Package Pat:
> /home/pbacsko/inteldevstack/intelFPGA_pro/hld/board/opencl_bsp
>  
> Vendor: Intel Corp
>  
> Physical Dev Name   StatusInformation
>  
> pac_a10_f20 PassedPAC Arria 10 Platform (pac_a10_f20)
>   PCIe 08:00.0
>   FPGA temperature = 79 degrees C.
>  
> DIAGNOSTIC_PASSED
> 
>  
> Call "aocl diagnose " to run diagnose for specified devices
> Call "aocl diagnose all" to run diagnose for all devices
> {noformat}
> The plugin fails to recognize this and fails with the following message:
> {noformat}
> 2019-01-25 06:46:02,834 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaResourcePlugin:
>  Using FPGA vendor plugin: 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
> 2019-01-25 06:46:02,943 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.FpgaDiscoverer:
>  Trying to diagnose FPGA information ...
> 2019-01-25 06:46:03,085 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule:
>  Using traffic control bandwidth handler
> 2019-01-25 06:46:03,108 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsHandlerImpl:
>  Initializing mounted controller cpu at /sys/fs/cgroup/cpu,cpuacct/yarn
> 2019-01-25 06:46:03,139 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.fpga.FpgaResourceHandlerImpl:
>  FPGA Plugin bootstrap success.
> 2019-01-25 06:46:03,247 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)bus:slot.func\s=\s.*, pattern
> 2019-01-25 06:46:03,248 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Couldn't find (?i)Total\sCard\sPower\sUsage\s=\s.* pattern
> 2019-01-25 06:46:03,251 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin:
>  Failed to get major-minor number from reading /dev/pac_a10_f30
> 2019-01-25 06:46:03,252 ERROR 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Failed to 
> bootstrap configured resource subsystems!
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerException:
>  No FPGA devices detected!
> {noformat}
> Problem #2
> The plugin assumes that the file name under {{/dev}} can be derived from the 
> "Physical Dev Name", but this is wrong. For example, it thinks that the 
> device file is {{/dev/pac_a10_f30}} which is not the case, the actual 
> file is {{/dev/intel-fpga-port.0}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773190#comment-16773190
 ] 

Prabhu Joseph commented on YARN-9258:
-

Thanks [~cheersyang], attached patch after rebasing.

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch, 
> YARN-9258-003.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7266) Timeline Server event handler threads locked

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773185#comment-16773185
 ] 

Hadoop QA commented on YARN-7266:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
28s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 36s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 16 new + 0 unchanged - 0 fixed = 16 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
52s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
57s{color} | {color:green} hadoop-yarn-server-applicationhistoryservice in the 
patch passed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
51s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}101m 12s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-7266 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959452/YARN-7266-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ddbd720e1e4e 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / aa3ad36 |
| maven | v

[jira] [Updated] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9258:

Attachment: YARN-9258-003.patch

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch, 
> YARN-9258-003.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6538) Inter Queue preemption is not happening when DRF is configured

2019-02-20 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773177#comment-16773177
 ] 

Eric Payne commented on YARN-6538:
--

In practice, this seems to me to be an uncommon use case. For example, in our 
clusters, we have an average of about 7 vcores per gigabyte, and we use 
preemption all the time. In the above example, there is 0.05 vcores per 
gigabyte. This seems like a fringe case where preemption may not be happening 
because of rounding calculations.

> Inter Queue preemption is not happening when DRF is configured
> --
>
> Key: YARN-6538
> URL: https://issues.apache.org/jira/browse/YARN-6538
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0
>Reporter: Sunil Govindan
>Assignee: Sunil Govindan
>Priority: Major
>
> Cluster capacity of . Here memory is more and vcores 
> are less. If applications have more demand, vcores might be exhausted. 
> Inter queue preemption ideally has to be kicked in once vcores is over 
> utilized. However preemption is not happening.
> Analysis:
> In {{AbstractPreemptableResourceCalculator.computeFixpointAllocation}}, 
> {code}
> // assign all cluster resources until no more demand, or no resources are
> // left
> while (!orderedByNeed.isEmpty() && Resources.greaterThan(rc, totGuarant,
> unassigned, Resources.none())) {
> {code}
>  will loop even when vcores are 0 (because memory is still +ve). Hence we are 
> having more vcores in idealAssigned which cause no-preemption cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types

2019-02-20 Thread Szilard Nemeth (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth updated YARN-9318:
-
Attachment: YARN-9318.001.patch

> Resources#multiplyAndRoundUp does not consider Resource Types
> -
>
> Key: YARN-9318
> URL: https://issues.apache.org/jira/browse/YARN-9318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9318.001.patch
>
>
> org.apache.hadoop.yarn.util.resource.Resources#multiplyAndRoundUp only deals 
> with memory and vcores while computing the rounded value. It should also 
> consider custom Resource Types as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types

2019-02-20 Thread Szilard Nemeth (JIRA)
Szilard Nemeth created YARN-9318:


 Summary: Resources#multiplyAndRoundUp does not consider Resource 
Types
 Key: YARN-9318
 URL: https://issues.apache.org/jira/browse/YARN-9318
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Szilard Nemeth
Assignee: Szilard Nemeth


org.apache.hadoop.yarn.util.resource.Resources#multiplyAndRoundUp only deals 
with memory and vcores while computing the rounded value. It should also 
consider custom Resource Types as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9318) Resources#multiplyAndRoundUp does not consider Resource Types

2019-02-20 Thread Gergely Pollak (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773147#comment-16773147
 ] 

Gergely Pollak commented on YARN-9318:
--

[~snemeth] thank you for the patch, LGTM +1 (Non-binding).

> Resources#multiplyAndRoundUp does not consider Resource Types
> -
>
> Key: YARN-9318
> URL: https://issues.apache.org/jira/browse/YARN-9318
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-9318.001.patch
>
>
> org.apache.hadoop.yarn.util.resource.Resources#multiplyAndRoundUp only deals 
> with memory and vcores while computing the rounded value. It should also 
> consider custom Resource Types as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9287) Consecutive String Builder Append Should Reuse

2019-02-20 Thread Ayush Saxena (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773139#comment-16773139
 ] 

Ayush Saxena commented on YARN-9287:


Ping [~bibinchundatt] [~giovanni.fumarola]

Can Someone Help With The Review :)

> Consecutive String Builder Append Should Reuse
> --
>
> Key: YARN-9287
> URL: https://issues.apache.org/jira/browse/YARN-9287
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: YARN-9287-01.patch, YARN-9287-02.patch, 
> YARN-9287-03.patch, YARN-9287-04.patch
>
>
>  Consecutive calls to StringBuffer/StringBuilder .append should be chained, 
> reusing the target object. This can improve the performance by producing a 
> smaller bytecode, reducing overhead and improving inlining.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7266) Timeline Server event handler threads locked

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773107#comment-16773107
 ] 

Prabhu Joseph commented on YARN-7266:
-

Currently {{ObjectMapper}} for TimelineWebService & AHSWebService DAO classes 
are reused while writing response through {{YarnJacksonJaxbJsonProvider}}.

{code:java}
   at 
org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider.locateMapper(YarnJacksonJaxbJsonProvider.java:56)
at 
org.codehaus.jackson.jaxrs.JacksonJsonProvider.writeTo(JacksonJsonProvider.java:501)
at 
com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306)
 at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:886)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at 
com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter$ServletFilterHttpInteraction.proceed(RestCsrfPreventionFilter.java:269)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.handleHttpInteraction(RestCsrfPreventionFilter.java:197)
at 
org.apache.hadoop.security.http.RestCsrfPreventionFilter.doFilter(RestCsrfPreventionFilter.java:209)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:617)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:294)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:576)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.http.CrossOriginFilter.doFilter(CrossOriginFilter.java:95)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1400)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
 

[jira] [Updated] (YARN-7266) Timeline Server event handler threads locked

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-7266:

Attachment: YARN-7266-001.patch

> Timeline Server event handler threads locked
> 
>
> Key: YARN-7266
> URL: https://issues.apache.org/jira/browse/YARN-7266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: ATSv2, timelineserver
>Affects Versions: 2.7.3
>Reporter: Venkata Puneet Ravuri
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-7266-001.patch
>
>
> Event handlers for Timeline Server seem to take a lock while parsing HTTP 
> headers of the request. This is causing all other threads to wait and slowing 
> down the overall performance of Timeline server. We have resourcemanager 
> metrics enabled to send to timeline server. Because of the high load on 
> ResourceManager, the metrics to be sent are getting backlogged and in turn 
> increasing heap footprint of Resource Manager (due to pending metrics).
> This is the complete stack trace of a blocked thread on timeline server:-
> "2079644967@qtp-1658980982-4560" #4632 daemon prio=5 os_prio=0 
> tid=0x7f6ba490a000 nid=0x5eb waiting for monitor entry 
> [0x7f6b9142c000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector.prepare(AccessorInjector.java:82)
> - waiting to lock <0x0005c0621860> (a java.lang.Class for 
> com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector)
> at 
> com.sun.xml.bind.v2.runtime.reflect.opt.OptimizedAccessorFactory.get(OptimizedAccessorFactory.java:168)
> at 
> com.sun.xml.bind.v2.runtime.reflect.Accessor$FieldReflection.optimize(Accessor.java:282)
> at 
> com.sun.xml.bind.v2.runtime.property.SingleElementNodeProperty.(SingleElementNodeProperty.java:94)
> at sun.reflect.GeneratedConstructorAccessor52.newInstance(Unknown 
> Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at 
> com.sun.xml.bind.v2.runtime.property.PropertyFactory.create(PropertyFactory.java:128)
> at 
> com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.(ClassBeanInfoImpl.java:183)
> at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(JAXBContextImpl.java:532)
> at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(JAXBContextImpl.java:551)
> at 
> com.sun.xml.bind.v2.runtime.property.ArrayElementProperty.(ArrayElementProperty.java:112)
> at 
> com.sun.xml.bind.v2.runtime.property.ArrayElementNodeProperty.(ArrayElementNodeProperty.java:62)
> at sun.reflect.GeneratedConstructorAccessor19.newInstance(Unknown 
> Source)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
> Source)
> at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at 
> com.sun.xml.bind.v2.runtime.property.PropertyFactory.create(PropertyFactory.java:128)
> at 
> com.sun.xml.bind.v2.runtime.ClassBeanInfoImpl.(ClassBeanInfoImpl.java:183)
> at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getOrCreate(JAXBContextImpl.java:532)
> at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:347)
> at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1170)
> at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:145)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> at javax.xml.bind.ContextFinder.newInstance(Unknown Source)
> at javax.xml.bind.ContextFinder.newInstance(Unknown Source)
> at javax.xml.bind.ContextFinder.find(Unknown Source)
> at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
> at javax.xml.bind.JAXBContext.newInstance(Unknown Source)
> at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.buildModelAndSchemas(WadlGeneratorJAXBGrammarGenerator.java:412)
> at 
> com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.createExternalGrammar(WadlGeneratorJAXBGrammarGenerator.java:352)
> at 
> com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:115)
> at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
> at 
> com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
> at 
> com.sun.jersey.server.impl.wadl.WadlMethodFactory$W

[jira] [Commented] (YARN-9258) Support to specify allocation tags without constraint in distributed shell CLI

2019-02-20 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773049#comment-16773049
 ] 

Weiwei Yang commented on YARN-9258:
---

Hi [~Prabhu Joseph]

Sure, I'll help to review this.

Can you rebase the patch to latest trunk? It doesn't seem to apply anymore. A 
quick glance, the patch looks good. Will comment once get a up-to-date patch. 
Thanks.

> Support to specify allocation tags without constraint in distributed shell CLI
> --
>
> Key: YARN-9258
> URL: https://issues.apache.org/jira/browse/YARN-9258
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: distributed-shell
>Affects Versions: 3.1.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-9258-001.patch, YARN-9258-002.patch
>
>
> DistributedShell PlacementSpec fails to parse 
> {color:#d04437}zk=1:spark=1,NOTIN,NODE,zk{color}
> {code}
> java.lang.IllegalArgumentException: Invalid placement spec: 
> zk=1:spark=1,NOTIN,NODE,zk
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:108)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.Client.init(Client.java:462)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDistributedShellWithPlacementConstraint(TestDistributedShell.java:1780)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParseException: 
> Source allocation tags is required for a multi placement constraint 
> expression.
>   at 
> org.apache.hadoop.yarn.util.constraint.PlacementConstraintParser.parsePlacementSpec(PlacementConstraintParser.java:740)
>   at 
> org.apache.hadoop.yarn.applications.distributedshell.PlacementSpec.parse(PlacementSpec.java:94)
>   ... 16 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772998#comment-16772998
 ] 

Hadoop QA commented on YARN-9315:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 33s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 33s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 26s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}147m 14s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueManagementDynamicEditPolicy
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9315 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959415/YARN-9315-001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 8ba83665eb53 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1d30fd9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23454/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23454/artifact/out/patch-unit-had

[jira] [Assigned] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned YARN-9317:
---

Assignee: Prabhu Joseph

> DefaultAMSProcessor#allocate  timelineServiceV2Enabled check is costly
> --
>
> Key: YARN-9317
> URL: https://issues.apache.org/jira/browse/YARN-9317
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Prabhu Joseph
>Priority: Major
>
> {code}
> if (YarnConfiguration.timelineServiceV2Enabled(
>  getRmContext().getYarnConfiguration())) 
> {code}
> DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9317) DefaultAMSProcessor#allocate timelineServiceV2Enabled check is costly

2019-02-20 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-9317:
--

 Summary: DefaultAMSProcessor#allocate  timelineServiceV2Enabled 
check is costly
 Key: YARN-9317
 URL: https://issues.apache.org/jira/browse/YARN-9317
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt


{code}
if (YarnConfiguration.timelineServiceV2Enabled(
 getRmContext().getYarnConfiguration())) 
{code}

DefaultAMSProcessor#init check is required only once and assign to boolean



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772933#comment-16772933
 ] 

Prabhu Joseph commented on YARN-8132:
-

Thanks [~bibinchundatt] for the review. Have reported YARN-9315 
({{TestCapacitySchedulerMetrics}})  and YARN-9316 
({{TestPlacementConstraintsUtil}}) - both test cases are failing intermittent.

> Final Status of applications shown as UNDEFINED in ATS app queries
> --
>
> Key: YARN-8132
> URL: https://issues.apache.org/jira/browse/YARN-8132
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineservice
>Reporter: Charan Hebri
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8132-001.patch, YARN-8132-002.patch, 
> YARN-8132-003.patch, YARN-8132-004.patch
>
>
> Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
> sample request/response with INFO field for an application,
> {noformat}
> 2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1693)) - Received URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user 
> hrt_qa
> 2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
> ms.){noformat}
> {noformat}
> {
>   "metrics": [],
>   "events": [],
>   "createdtime": 1523263360719,
>   "idprefix": 0,
>   "id": "application_1523259757659_0003",
>   "type": "YARN_APPLICATION",
>   "info": {
> "YARN_APPLICATION_CALLER_CONTEXT": "CLI",
> "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
> application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
> "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
> "YARN_APPLICATION_NAME": "Sleep job",
> "YARN_APPLICATION_USER": "hrt_qa",
> "YARN_APPLICATION_UNMANAGED_APPLICATION": false,
> "FROM_ID": 
> "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
> "UID": "yarn-cluster!application_1523259757659_0003",
> "YARN_APPLICATION_VIEW_ACLS": " ",
> "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
> "YARN_AM_CONTAINER_LAUNCH_COMMAND": [
>   "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
> -Dlog4j.configuration=container-log4j.properties 
> -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
> -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
> -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
> 2>/stderr "
> ],
> "YARN_APPLICATION_QUEUE": "default",
> "YARN_APPLICATION_TYPE": "MAPREDUCE",
> "YARN_APPLICATION_PRIORITY": 0,
> "YARN_APPLICATION_LATEST_APP_ATTEMPT": 
> "appattempt_1523259757659_0003_01",
> "YARN_APPLICATION_TAGS": [
>   "timeline_flow_name_tag:test_flow"
> ],
> "YARN_APPLICATION_STATE": "KILLED"
>   },
>   "configs": {},
>   "isrelatedto": {},
>   "relatesto": {}
> }{noformat}
> This is different to what the Resource Manager reports. For KILLED 
> applications the final status is KILLED and for FAILED applications it is 
> FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9315:

Description: 
TestCapacitySchedulerMetrics fails intermittently as assert check happens 
before the allocate completes - observed in YARN-8132

{code}
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.177 s 
<<< FAILURE! - in 
org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics
[ERROR] 
testCSMetrics(org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics)
  Time elapsed: 3.11 s  <<< FAILURE!
java.lang.AssertionError: expected:<2> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics.testCSMetrics(TestCapacitySchedulerMetrics.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:1
{code}

  was:TestCapacitySchedulerMetrics fails intermittently as assert check happens 
before the allocate completes.


> TestCapacitySchedulerMetrics fails intermittently
> -
>
> Key: YARN-9315
> URL: https://issues.apache.org/jira/browse/YARN-9315
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9315-001.patch
>
>
> TestCapacitySchedulerMetrics fails intermittently as assert check happens 
> before the allocate completes - observed in YARN-8132
> {code}
> [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.177 
> s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics
> [ERROR] 
> testCSMetrics(org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics)
>   Time elapsed: 3.11 s  <<< FAILURE!
> java.lang.AssertionError: expected:<2> but was:<1>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics.testCSMetrics(TestCapacitySchedulerMetrics.java:101)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcce

[jira] [Created] (YARN-9316) TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9316:
---

 Summary: 
TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails intermittently
 Key: YARN-9316
 URL: https://issues.apache.org/jira/browse/YARN-9316
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacity scheduler
Affects Versions: 3.1.2
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails 
intermittently - observed in YARN-8132

{code}
[ERROR] 
testInterAppConstraintsByAppID(org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil)
  Time elapsed: 0.339 s  <<< FAILURE!
java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil.testInterAppConstraintsByAppID(TestPlacementConstraintsUtil.java:965)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9316) TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9316:

Priority: Minor  (was: Major)

> TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails 
> intermittently
> 
>
> Key: YARN-9316
> URL: https://issues.apache.org/jira/browse/YARN-9316
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TestPlacementConstraintsUtil#testInterAppConstraintsByAppID fails 
> intermittently - observed in YARN-8132
> {code}
> [ERROR] 
> testInterAppConstraintsByAppID(org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil)
>   Time elapsed: 0.339 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at org.junit.Assert.assertFalse(Assert.java:74)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil.testInterAppConstraintsByAppID(TestPlacementConstraintsUtil.java:965)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-9296) [Timeline Server] FinalStatus is displayed wrong for killed and failed applications

2019-02-20 Thread Bibin A Chundatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt resolved YARN-9296.

Resolution: Duplicate

> [Timeline Server] FinalStatus is displayed wrong for killed and failed 
> applications
> ---
>
> Key: YARN-9296
> URL: https://issues.apache.org/jira/browse/YARN-9296
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Nallasivan
>Assignee: Prabhu Joseph
>Priority: Minor
>
> Timline Server(1.5), FinalStatus of the applications which are killed and 
> failed, is displayed as UNDEFINED in both GUI, REST API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2019-02-20 Thread Bibin A Chundatt (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772926#comment-16772926
 ] 

Bibin A Chundatt commented on YARN-8132:


Over all patch looks good to me. 

Will check in by tomorrow if no objections.

[~Prabhu Joseph] Any jira to track TestPlacementConstraintsUtil failure ?

> Final Status of applications shown as UNDEFINED in ATS app queries
> --
>
> Key: YARN-8132
> URL: https://issues.apache.org/jira/browse/YARN-8132
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineservice
>Reporter: Charan Hebri
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8132-001.patch, YARN-8132-002.patch, 
> YARN-8132-003.patch, YARN-8132-004.patch
>
>
> Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
> sample request/response with INFO field for an application,
> {noformat}
> 2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1693)) - Received URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user 
> hrt_qa
> 2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
> ms.){noformat}
> {noformat}
> {
>   "metrics": [],
>   "events": [],
>   "createdtime": 1523263360719,
>   "idprefix": 0,
>   "id": "application_1523259757659_0003",
>   "type": "YARN_APPLICATION",
>   "info": {
> "YARN_APPLICATION_CALLER_CONTEXT": "CLI",
> "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
> application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
> "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
> "YARN_APPLICATION_NAME": "Sleep job",
> "YARN_APPLICATION_USER": "hrt_qa",
> "YARN_APPLICATION_UNMANAGED_APPLICATION": false,
> "FROM_ID": 
> "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
> "UID": "yarn-cluster!application_1523259757659_0003",
> "YARN_APPLICATION_VIEW_ACLS": " ",
> "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
> "YARN_AM_CONTAINER_LAUNCH_COMMAND": [
>   "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
> -Dlog4j.configuration=container-log4j.properties 
> -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
> -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
> -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
> 2>/stderr "
> ],
> "YARN_APPLICATION_QUEUE": "default",
> "YARN_APPLICATION_TYPE": "MAPREDUCE",
> "YARN_APPLICATION_PRIORITY": 0,
> "YARN_APPLICATION_LATEST_APP_ATTEMPT": 
> "appattempt_1523259757659_0003_01",
> "YARN_APPLICATION_TAGS": [
>   "timeline_flow_name_tag:test_flow"
> ],
> "YARN_APPLICATION_STATE": "KILLED"
>   },
>   "configs": {},
>   "isrelatedto": {},
>   "relatesto": {}
> }{noformat}
> This is different to what the Resource Manager reports. For KILLED 
> applications the final status is KILLED and for FAILED applications it is 
> FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9314) Fair Scheduler: Queue Info mistake when configured same queue name at same level

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772903#comment-16772903
 ] 

Hadoop QA commented on YARN-9314:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 28s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 5 new + 18 unchanged - 0 fixed = 23 total (was 18) {color} 
|
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 50s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 88m 19s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}134m 21s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-9314 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959404/YARN-9341.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux d57cddcdc830 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1d30fd9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/23453/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23453/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN

[jira] [Updated] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated YARN-9315:

Attachment: YARN-9315-001.patch

> TestCapacitySchedulerMetrics fails intermittently
> -
>
> Key: YARN-9315
> URL: https://issues.apache.org/jira/browse/YARN-9315
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: capacity scheduler
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: YARN-9315-001.patch
>
>
> TestCapacitySchedulerMetrics fails intermittently as assert check happens 
> before the allocate completes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-9315) TestCapacitySchedulerMetrics fails intermittently

2019-02-20 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created YARN-9315:
---

 Summary: TestCapacitySchedulerMetrics fails intermittently
 Key: YARN-9315
 URL: https://issues.apache.org/jira/browse/YARN-9315
 Project: Hadoop YARN
  Issue Type: Test
  Components: capacity scheduler
Affects Versions: 3.1.2
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestCapacitySchedulerMetrics fails intermittently as assert check happens 
before the allocate completes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2019-02-20 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772861#comment-16772861
 ] 

Prabhu Joseph commented on YARN-8132:
-

Test case failures are not related. {{TestCapacitySchedulerMetrics}} fails 
intermittently as assert check happens before the allocate completes.

> Final Status of applications shown as UNDEFINED in ATS app queries
> --
>
> Key: YARN-8132
> URL: https://issues.apache.org/jira/browse/YARN-8132
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: ATSv2, timelineservice
>Reporter: Charan Hebri
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: YARN-8132-001.patch, YARN-8132-002.patch, 
> YARN-8132-003.patch, YARN-8132-004.patch
>
>
> Final Status is shown as UNDEFINED for applications that are KILLED/FAILED. A 
> sample request/response with INFO field for an application,
> {noformat}
> 2018-04-09 13:10:02,126 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1693)) - Received URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO from user 
> hrt_qa
> 2018-04-09 13:10:02,156 INFO  reader.TimelineReaderWebServices 
> (TimelineReaderWebServices.java:getApp(1716)) - Processed URL 
> /ws/v2/timeline/apps/application_1523259757659_0003?fields=INFO (Took 30 
> ms.){noformat}
> {noformat}
> {
>   "metrics": [],
>   "events": [],
>   "createdtime": 1523263360719,
>   "idprefix": 0,
>   "id": "application_1523259757659_0003",
>   "type": "YARN_APPLICATION",
>   "info": {
> "YARN_APPLICATION_CALLER_CONTEXT": "CLI",
> "YARN_APPLICATION_DIAGNOSTICS_INFO": "Application 
> application_1523259757659_0003 was killed by user xxx_xx at XXX.XXX.XXX.XXX",
> "YARN_APPLICATION_FINAL_STATUS": "UNDEFINED",
> "YARN_APPLICATION_NAME": "Sleep job",
> "YARN_APPLICATION_USER": "hrt_qa",
> "YARN_APPLICATION_UNMANAGED_APPLICATION": false,
> "FROM_ID": 
> "yarn-cluster!hrt_qa!test_flow!1523263360719!application_1523259757659_0003",
> "UID": "yarn-cluster!application_1523259757659_0003",
> "YARN_APPLICATION_VIEW_ACLS": " ",
> "YARN_APPLICATION_SUBMITTED_TIME": 1523263360718,
> "YARN_AM_CONTAINER_LAUNCH_COMMAND": [
>   "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp 
> -Dlog4j.configuration=container-log4j.properties 
> -Dyarn.app.container.log.dir= -Dyarn.app.container.log.filesize=0 
> -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog 
> -Dhdp.version=3.0.0.0-1163 -Xmx819m -Dhdp.version=3.0.0.0-1163 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/stdout 
> 2>/stderr "
> ],
> "YARN_APPLICATION_QUEUE": "default",
> "YARN_APPLICATION_TYPE": "MAPREDUCE",
> "YARN_APPLICATION_PRIORITY": 0,
> "YARN_APPLICATION_LATEST_APP_ATTEMPT": 
> "appattempt_1523259757659_0003_01",
> "YARN_APPLICATION_TAGS": [
>   "timeline_flow_name_tag:test_flow"
> ],
> "YARN_APPLICATION_STATE": "KILLED"
>   },
>   "configs": {},
>   "isrelatedto": {},
>   "relatesto": {}
> }{noformat}
> This is different to what the Resource Manager reports. For KILLED 
> applications the final status is KILLED and for FAILED applications it is 
> FAILED. This behavior is seen in ATSv2 as well as older versions of ATS. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
 ] 

Tao Yang edited comment on YARN-9313 at 2/20/19 9:38 AM:
-

Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to 
represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
 4. Maintain the relationships among input node, nodeId key of 
recordingNodeAllocation and nodeId in activities info. For multi-nodes 
placement scenario, input node can be a special node or null, the nodeId key of 
recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT and the 
nodeId in activities info should be a special node or 
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in 
recording key or nodeId in activities info based on input node: (1) nodeId 
should be the nodeId of input node which is not null, and should be 
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile 
multi-nodes is enabled, somewhere should be updated properly in 
ActivitiesLogger. (2) When recording activities, nodeId in activities info 
could be a special node but nodeId key of recordingNodeAllocation should be 
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording 
key at the head of ActivitiesManager#getCurrentNodeAllocation and still 
recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
 7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
 8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
 9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.


was (Author: tao yang):
Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to 
represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
 4. Maintain the relationships between input node and recording key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
nodeId in recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT 
and the nodeId in activities info should be a special node or 
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in 
recording key or nodeId in activities info based on input node: (1) nodeId 
should be the nodeId of input node which is not null, and should be 
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile 
multi-nodes is enabled, somewhere should be updated properly in 
ActivitiesLogger. (2) When recording activities, nodeId in activities info 
could be a special node but in recordingNodeAllocation nodeId should be 
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording 
key at the head of ActivitiesManager#getCurrentNodeAllocation and still 
recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation s

[jira] [Comment Edited] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
 ] 

Tao Yang edited comment on YARN-9313 at 2/20/19 9:35 AM:
-

Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
 1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to 
represent multiple nodes.
 2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
 3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
 4. Maintain the relationships between input node and recording key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
nodeId in recordingNodeAllocation should be ActivitiesManager#MULTI_NODES_AGENT 
and the nodeId in activities info should be a special node or 
ActivitiesManager#MULTI_NODES_AGENT. Thus we need to get correct nodeId in 
recording key or nodeId in activities info based on input node: (1) nodeId 
should be the nodeId of input node which is not null, and should be 
ActivitiesManager#MULTI_NODES_AGENT when input node is null meanwhile 
multi-nodes is enabled, somewhere should be updated properly in 
ActivitiesLogger. (2) When recording activities, nodeId in activities info 
could be a special node but in recordingNodeAllocation nodeId should be 
ActivitiesManager#MULTI_NODES_AGENT, so that we need to get correct recording 
key at the head of ActivitiesManager#getCurrentNodeAllocation and still 
recording the nodeId of input node in activities info.
 5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
 6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
 7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
 8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
 9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.


was (Author: tao yang):
Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent 
multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
4. Maintain the relationships between input node and activities key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
activities index should be ActivitiesManager#MULTI_NODES_AGENT and activities 
info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we 
need to transform nodeId somewhere to make it work: (1) Input nodeId should be 
a special nodeId if input node is not null and should be 
ActivitiesManager#MULTI_NODES_AGENT if input node is null and multi-nodes is 
recording, input nodeId should be updated properly in ActivitiesLogger. (2) 
When recording activities, input node could be a special node but activities 
key should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get 
correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation 
and still recording the special nodeId in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesM

[jira] [Commented] (YARN-8821) [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable device framework

2019-02-20 Thread Weiwei Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772807#comment-16772807
 ] 

Weiwei Yang commented on YARN-8821:
---

Thanks [~tangzhankun], LGTM. +1 to v10 patch.

I will commit this patch tomorrow if no further comments from others.

> [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable 
> device framework
> -
>
> Key: YARN-8821
> URL: https://issues.apache.org/jira/browse/YARN-8821
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: GPUTopologyPerformance.png, YARN-8821-trunk.001.patch, 
> YARN-8821-trunk.002.patch, YARN-8821-trunk.003.patch, 
> YARN-8821-trunk.004.patch, YARN-8821-trunk.005.patch, 
> YARN-8821-trunk.006.patch, YARN-8821-trunk.007.patch, 
> YARN-8821-trunk.008.patch, YARN-8821-trunk.009.patch, 
> YARN-8821-trunk.010.patch
>
>
> h2. Background
> GPU topology affects performance. There's been a discussion in YARN-7481. But 
> we'd like to move related discussions here.
> And please note that YARN-8851 will provide a pluggable device framework 
> which can support plugin custom scheduler. Based on the framework, GPU plugin 
> could have own topology scheduler.
> h2. Details of the proposed scheduling algorithm
> The proposed patch has a topology algorithm implemented as below:
>  *Step 1*. When allocating devices, parse the output of "nvidia-smi topo -m" 
> to build a hash map whose key is all pairs of GPUs and the value is the 
> communication cost between the two. The map is like \{"0 - 1"=> 2, "0 - 
> 2"=>4, ...} which means the minimum cost of GPU 0 to 1 is 2. The cost is set 
> based on the connection type.
> *Step 2*. And then it constructs a _+cost table+_ which caches all 
> combinations of GPUs and corresponding cost between them and cache it. The 
> cost table is a map whose structure is like
> {code:java}
> { 2=>{[0,1]=>2,..},
>   3=>{[0,1,2]=>10,..},
>   4=>{[0,1,2,3]=>18}}.
> {code}
> The key of the map is the count of GPUs, the value of it is a map whose key 
> is the combination of GPUs and the value is the calculated communication cost 
> of the numbers of GPUs. The cost calculation algorithm is to sum all 
> non-duplicate pairs of GPU's cost. For instance, the total cost of [0,1,2] 
> GPUs are the sum of cost "0 - 1", "0 - 2" and "1 - 2". And each cost can get 
> from the map built in step 1.
> *Step 3*. After the cost table is built, when allocating GPUs based on 
> topology, we provide two policy which container can set through an 
> environment variable "NVIDIA_TOPO_POLICY". The value can be either "PACK" or 
> "SPREAD". The "PACK" means it prefers faster GPU-GPU communication. The 
> "SPREAD" means it prefers faster CPU-GPU communication( since GPUs are not 
> using the same bus to CPU). And the key difference of the two policy is the 
> sort order of the inner map in the cost table. For instance, let's assume 2 
> GPUs is wanted. The costTable.get(2) would return a map containing all 
> combinations of two GPUs and their cost. If the policy is "PACK", we'll sort 
> the map by cost in ascending order. The first entry will be the GPUs has 
> minimum GPU-GPU cost. If the policy is "SPREAD", we sort it in descending 
> order and get the first one which is the highest GPU-GPU cost which means 
> lowest CPU-GPU costs.
> h2. Estimation of the algorithm
> Initial analysis of the topology scheduling algorithm(Using PACK policy) 
> based on the performance tests in an AWS EC2 with 8 GPU cards (P3) is done. 
> Below figure shows the performance gain of the topology scheduling 
> algorithm's allocation (PACK policy).
> !GPUTopologyPerformance.png!  
> Some of the conclusions are:
> 1. The topology between GPUs impacts the performance dramatically. The best 
> combination GPUs can get *5% to 185%* *performance gain* among the test cases 
> with various factors including CNN model, batch size, GPU subset, etc. The 
> scheduling algorithm should be close to this fact.
> 2. The "inception3" and "resnet50" networks seem not topology sensitive. The 
> topology scheduling can only potentially get *about 6.8% to 10%* speedup in 
> best cases.
> 3. Our current version of topology scheduling algorithm can achieve 6.8*% to 
> 177.1%* *performance gain in best cases. In average, it also outperforms the 
> median performance(0.8% to 28.2%).*
> *4. And the algorithm's allocations match the fastest GPUs needed by "vgg16" 
> best*.
>  
> In summary, the GPU topology scheduling algorithm is effective and can 
> potentially get 6.8% to 185% performance gain in the best cases and 1% to 30% 
> on average.
>  *It means about maximum 3X comparing to a random GPU scheduling algorithm in 
> a specific

[jira] [Commented] (YARN-8132) Final Status of applications shown as UNDEFINED in ATS app queries

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772799#comment-16772799
 ] 

Hadoop QA commented on YARN-8132:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m  8s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 225 unchanged - 2 fixed = 225 total (was 227) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 43s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 59s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
24s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.constraint.TestPlacementConstraintsUtil
 |
|   | hadoop.yarn.server.resourcemanager.TestCapacitySchedulerMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8132 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959380/YARN-8132-004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e6d86bb5b20b 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1d30fd9 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/23449/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 |
|  Test Results | 
https://builds.ap

[jira] [Updated] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9313:
---
Attachment: YARN-9313.001.patch

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> --
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9313:
---
Attachment: (was: YARN-9313.001.patch)

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> --
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772781#comment-16772781
 ] 

Tao Yang commented on YARN-9313:


Descriptions of key changes in this patch are as follows, hope someone can help 
for the review:
1. Add a fake node id named MULTI_NODES_AGENT in ActivitiesManager to represent 
multiple nodes.
2. Place the start/finish points of scheduler activities in front of/after the 
allocation based on single node (input node is a real node) or multiple nodes 
(input node is ActivitiesManager#MULTI_NODES_AGENT) in 
CapacityScheduler#allocateContainersToNode instead of 
CapacityScheduler#nodeUpdate, to expand the applicable scenarios via unified 
entrance and exit.
3. After initializing activities, activeRecordedNodes should remove current 
active node in ActivitiesManager#startNodeUpdateRecording to make sure current 
activities process can only be started once.
4. Maintain the relationships between input node and activities key. For 
multi-nodes placement scenario, input node can be a special node or null, the 
activities index should be ActivitiesManager#MULTI_NODES_AGENT and activities 
info should be a special node or ActivitiesManager#MULTI_NODES_AGENT. Thus we 
need to transform nodeId somewhere to make it work: (1) Input nodeId should be 
a special nodeId if input node is not null and should be 
ActivitiesManager#MULTI_NODES_AGENT if input node is null and multi-nodes is 
recording, input nodeId should be updated properly in ActivitiesLogger. (2) 
When recording activities, input node could be a special node but activities 
key should be ActivitiesManager#MULTI_NODES_AGENT, so that we need to get 
correct recording key at the head of ActivitiesManager#getCurrentNodeAllocation 
and still recording the special nodeId in activities info.
5. Update the if clauses at the head of several methods in ActivitiesLogger to 
relax restrictions(only for non-null node now) on scheduler activities.
6. ActivitiesManager#recordingNodesAllocation should be updated to be a 
thread-local variable to avoid recording mixed activities from multiple 
scheduling processes in asynchronized scheduling mode.
7. Add TestActivitiesManager to test multiple threads can run without 
interference for normal scenario and multi-nodes enabled scenario.
8. Update check logic in 
TestRMWebServicesSchedulerActivities#testAssignMultipleContainersPerNodeHeartbeat
 since collection logic of scheduler activities changed after this patch and 
only one allocation should be recorded for all scenarios.
9. Add TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled to test 
recording scheduler activities with multi-nodes enabled.

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> --
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9313:
---
Attachment: (was: YARN-9313.001.patch)

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> --
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9313) Support asynchronized scheduling mode and multi-node lookup mechanism for scheduler activities

2019-02-20 Thread Tao Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-9313:
---
Attachment: YARN-9313.001.patch

> Support asynchronized scheduling mode and multi-node lookup mechanism for 
> scheduler activities
> --
>
> Key: YARN-9313
> URL: https://issues.apache.org/jira/browse/YARN-9313
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: YARN-9313.001.patch
>
>
> [Design 
> doc|https://docs.google.com/document/d/1pwf-n3BCLW76bGrmNPM4T6pQ3vC4dVMcN2Ud1hq1t2M/edit#heading=h.d2ru7sigsi7j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8891) Documentation of the pluggable device framework

2019-02-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772750#comment-16772750
 ] 

Hadoop QA commented on YARN-8891:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
27m 16s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-yarn-site in the patch failed. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 23 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m  7s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
27s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | YARN-8891 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12959389/YARN-8891-trunk.003.patch
 |
| Optional Tests |  dupname  asflicense  mvnsite  |
| uname | Linux 067adebbe1be 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 1d30fd9 |
| maven | version: Apache Maven 3.3.9 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-YARN-Build/23452/artifact/out/patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-site.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/23452/artifact/out/whitespace-eol.txt
 |
| Max. process+thread count | 447 (vs. ulimit of 1) |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/23452/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Documentation of the pluggable device framework
> ---
>
> Key: YARN-8891
> URL: https://issues.apache.org/jira/browse/YARN-8891
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Zhankun Tang
>Assignee: Zhankun Tang
>Priority: Major
> Attachments: YARN-8891-trunk.001.patch, YARN-8891-trunk.002.patch, 
> YARN-8891-trunk.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org