[jira] [Updated] (YARN-6062) nodemanager memory leak

2017-01-08 Thread gehaijiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gehaijiang updated YARN-6062:
-
Attachment: smaps.84971.txt
jstack.84971.txt
jmap.84971.txt

jmap、jstack、 /proc/pid/smaps  file  information 

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-08 Thread gehaijiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809052#comment-15809052
 ] 

gehaijiang commented on YARN-6062:
--

Already attached

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-08 Thread gehaijiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809058#comment-15809058
 ] 

gehaijiang commented on YARN-6062:
--

mat  information:

 类名称   (类ID)对象个数本身所占内存  本身加引用所占内存
java.util.concurrent.ConcurrentHashMap   (37247)530 41K 25M
java.util.concurrent.ConcurrentHashMap$Segment[]   (41421)  530 78K 
25M
java.util.concurrent.ConcurrentHashMap$Segment   (34044)5,938   278K
25M
java.util.concurrent.ConcurrentHashMap$HashEntry[]   (37906)5,938   1,239K  
25M
java.util.concurrent.ConcurrentHashMap$HashEntry   (34043)  41,944  1,966K  
25M
byte[]   (35091)2,566   16,633K 16M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
   (41148)  1   0K  11M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl
   (37582)138 13K 11M
org.apache.hadoop.mapred.ShuffleHandler$HttpPipelineFactory   (45434)   1   
0K  10M
org.apache.hadoop.mapred.ShuffleHandler$Shuffle   (45435)   1   0K  
10M
org.apache.hadoop.mapred.IndexCache   (89561)   1   0K  10M

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6062) nodemanager memory leak

2017-01-08 Thread gehaijiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809058#comment-15809058
 ] 

gehaijiang edited comment on YARN-6062 at 1/8/17 9:19 AM:
--

mat  information:

使用的堆内存  41M
对象数量413,471
类数量 6,033
Class Loader 数量 92
GC ROOT 数量  1,904
文件格式hprof
日期  2016-12-27 10:38:18
位数  64-bit

 类名称   (类ID)对象个数本身所占内存  本身加引用所占内存
java.util.concurrent.ConcurrentHashMap   (37247)530 41K 25M
java.util.concurrent.ConcurrentHashMap$Segment[]   (41421)  530 78K 
25M
java.util.concurrent.ConcurrentHashMap$Segment   (34044)5,938   278K
25M
java.util.concurrent.ConcurrentHashMap$HashEntry[]   (37906)5,938   1,239K  
25M
java.util.concurrent.ConcurrentHashMap$HashEntry   (34043)  41,944  1,966K  
25M
byte[]   (35091)2,566   16,633K 16M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
   (41148)  1   0K  11M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl
   (37582)138 13K 11M
org.apache.hadoop.mapred.ShuffleHandler$HttpPipelineFactory   (45434)   1   
0K  10M
org.apache.hadoop.mapred.ShuffleHandler$Shuffle   (45435)   1   0K  
10M
org.apache.hadoop.mapred.IndexCache   (89561)   1   0K  10M


was (Author: gehaijiang):
mat  information:

 类名称   (类ID)对象个数本身所占内存  本身加引用所占内存
java.util.concurrent.ConcurrentHashMap   (37247)530 41K 25M
java.util.concurrent.ConcurrentHashMap$Segment[]   (41421)  530 78K 
25M
java.util.concurrent.ConcurrentHashMap$Segment   (34044)5,938   278K
25M
java.util.concurrent.ConcurrentHashMap$HashEntry[]   (37906)5,938   1,239K  
25M
java.util.concurrent.ConcurrentHashMap$HashEntry   (34043)  41,944  1,966K  
25M
byte[]   (35091)2,566   16,633K 16M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
   (41148)  1   0K  11M
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl
   (37582)138 13K 11M
org.apache.hadoop.mapred.ShuffleHandler$HttpPipelineFactory   (45434)   1   
0K  10M
org.apache.hadoop.mapred.ShuffleHandler$Shuffle   (45435)   1   0K  
10M
org.apache.hadoop.mapred.IndexCache   (89561)   1   0K  10M

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Bibin A Chundatt (JIRA)
Bibin A Chundatt created YARN-6072:
--

 Summary: RM unable to start in secure mode
 Key: YARN-6072
 URL: https://issues.apache.org/jira/browse/YARN-6072
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Bibin A Chundatt
Priority: Blocker


Resource manager is unable to start in secure mode

{code}
2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
resource hadoop-policy.xml at 
file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
2017-01-08 14:27:29,918 INFO 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
java.lang.NullPointerException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2017-01-08 14:27:29,919 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
so firing fatal event
org.apache.hadoop.ha.ServiceFailedException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
Reader #1 for port 8033
2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
during transition to Active
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
at 
org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
... 5 more

{code}

ResourceManager services are added in following order
# EmbeddedElector
# AdminService

During resource manager service start() .EmbeddedElector starts first and 
invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
complete. So {{AdminService#server}} will be *null* which causes  
{{AdminService#refreshAll()}}  to fail
{code}
  if (getConfig().getBoolean(
  CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
  false)) {
refreshServiceAcls();
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issue

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-6072:
---
Attachment: hadoop-secureuser-resourcemanager-vm1.log

Attaching logs for the same

cc [~rohithsharma] [~naganarasimha...@apache.org] [~ajithshetty]

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is

[jira] [Commented] (YARN-5988) RM unable to start in secure setup

2017-01-08 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809068#comment-15809068
 ] 

Bibin A Chundatt commented on YARN-5988:


[~rohithsharma] raised YARN-6072 to track the same. 

> RM unable to start in secure setup
> --
>
> Key: YARN-5988
> URL: https://issues.apache.org/jira/browse/YARN-5988
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha1
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-5988.01.patch, YARN-5988.02.patch, 
> YARN-5988.03.patch, YARN-5988.04.patch, YARN-5988.05.patch, 
> hadoop-secureuser-resourcemanager-vm1.log
>
>
> When CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION=true
> RM is unable to start



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5984) Refactor move application across queue's CS level implementation

2017-01-08 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5984:
--
Attachment: YARN-5984.0002.patch

Attaching new patch after rebase. [~leftnoteasy] [~rohithsharma] please help to 
take a look.

> Refactor move application across queue's CS level implementation
> 
>
> Key: YARN-5984
> URL: https://issues.apache.org/jira/browse/YARN-5984
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Reporter: Sunil G
>Assignee: Sunil G
> Attachments: YARN-5984.0001.patch, YARN-5984.0002.patch
>
>
> Currently we use a top level write lock in CS#moveApplication. Also we are 
> using few submission time apis in move. This jira will be focussing on coming 
> up with a cleaner implementation for moveApplication and could try to share 
> code with FS as possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5984) Refactor move application across queue's CS level implementation

2017-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809243#comment-15809243
 ] 

Hadoop QA commented on YARN-5984:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 0 new + 131 unchanged - 1 fixed = 131 total (was 132) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
13s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 43s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 65m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Nullcheck of targetQueue at line 1158 of value previously dereferenced in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.handleMoveToPlanQueue(Queue)
  At AbstractYarnScheduler.java:1158 of value previously dereferenced in 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.handleMoveToPlanQueue(Queue)
  At AbstractYarnScheduler.java:[line 1157] |
| Failed junit tests | 
hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-5984 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846217/YARN-5984.0002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7fa66b3e796c 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 
15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 679478d |
| Default J

[jira] [Commented] (YARN-6062) nodemanager memory leak

2017-01-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809326#comment-15809326
 ] 

Rohith Sharma K S commented on YARN-6062:
-

It appears to be attached details are different from described NM process. 
Would you confirm that is it same process or different?

> nodemanager memory leak
> ---
>
> Key: YARN-6062
> URL: https://issues.apache.org/jira/browse/YARN-6062
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.1
>Reporter: gehaijiang
> Attachments: jmap.84971.txt, jstack.84971.txt, smaps.84971.txt
>
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  8986 data  20   0 21.3g  19g 7376 S  5.5 20.7   2458:09 java
> 38432 data  20   0  9.8g 7.9g 6300 S 95.5  8.4  35273:23 java
>  6653 data  20   0 4558m 3.4g  10m S  9.2  3.6   6640:37 java
> $ jps
> 6653 NodeManager
> Nodemanager memory has been up,Reach  10G。
> nodemanager   yarn-env.sh  configure  (2G)
> YARN_NODEMANAGER_OPTS=" -Xms2048m -Xmn768m 
> -Xloggc:${YARN_LOG_DIR}/nodemanager.gc.log -XX:+PrintGCDateStamps 
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail

2017-01-08 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-5219:
--
Attachment: YARN-5219.004.patch

Sorry for a lil long delay here. It fell off my radar.

Yes, its correct. We can use "set -e" for validating only env variables. After 
the export section in launch_container.sh, i think its fine to add back "set 
+e".

Updating a patch for same. Kindly review.

> When an export var command fails in launch_container.sh, the full container 
> launch should fail
> --
>
> Key: YARN-5219
> URL: https://issues.apache.org/jira/browse/YARN-5219
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Sunil G
> Attachments: YARN-5219-branch-2.001.patch, YARN-5219.001.patch, 
> YARN-5219.003.patch, YARN-5219.004.patch
>
>
> Today, a container fails if certain files fail to localize. However, if 
> certain env vars fail to get setup properly either due to bugs in the yarn 
> application or misconfiguration, the actual process launch still gets 
> triggered. This results in either confusing error messages if the process 
> fails to launch or worse yet the process launches but then starts behaving 
> wrongly if the env var is used to control some behavioral aspects. 
> In this scenario, the issue was reproduced by trying to do export 
> abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6066) Opportunistic containers minor fixes: API annotations and config parameter changes

2017-01-08 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809617#comment-15809617
 ] 

Arun Suresh commented on YARN-6066:
---

Committing this to branch-2 shortly since these changes have already been 
reviewed in YARN-6041

> Opportunistic containers minor fixes: API annotations and config parameter 
> changes
> --
>
> Key: YARN-6066
> URL: https://issues.apache.org/jira/browse/YARN-6066
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Attachments: YARN-6066-branch-2.001.patch
>
>
> Creating this to capture changes suggested by [~leftnoteasy] and [~kasha] in 
> YARN-6041 in its own JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail

2017-01-08 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809650#comment-15809650
 ] 

Allen Wittenauer commented on YARN-5219:


bq. After the export section in launch_container.sh, i think its fine to add 
back "set +e".

Why?  If the container launch script has failures in it, we should be aborting. 
 If those failures are ok, then we should be writing better bash code that 
takes optional failures into consideration.

> When an export var command fails in launch_container.sh, the full container 
> launch should fail
> --
>
> Key: YARN-5219
> URL: https://issues.apache.org/jira/browse/YARN-5219
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Sunil G
> Attachments: YARN-5219-branch-2.001.patch, YARN-5219.001.patch, 
> YARN-5219.003.patch, YARN-5219.004.patch
>
>
> Today, a container fails if certain files fail to localize. However, if 
> certain env vars fail to get setup properly either due to bugs in the yarn 
> application or misconfiguration, the actual process launch still gets 
> triggered. This results in either confusing error messages if the process 
> fails to launch or worse yet the process launches but then starts behaving 
> wrongly if the env var is used to control some behavioral aspects. 
> In this scenario, the issue was reproduced by trying to do export 
> abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6066) Opportunistic containers minor fixes: API annotations and config parameter changes

2017-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809732#comment-15809732
 ] 

Hudson commented on YARN-6066:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11086 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11086/])
YARN-6066. Opportunistic containers Minor fixes : API annotations, (arun 
suresh: rev 85826f6ca5a6d06b711a6805f7a1a6788852db05)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMROpportunisticMaps.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ContainerState.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/OpportunisticContainers.md


> Opportunistic containers minor fixes: API annotations and config parameter 
> changes
> --
>
> Key: YARN-6066
> URL: https://issues.apache.org/jira/browse/YARN-6066
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
>Priority: Minor
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6066-branch-2.001.patch
>
>
> Creating this to capture changes suggested by [~leftnoteasy] and [~kasha] in 
> YARN-6041 in its own JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-08 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809944#comment-15809944
 ] 

Jordan Zimmerman commented on YARN-3774:


FYI - this will be fixed in the next release of Curator per CURATOR-200

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Blocker
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created YARN-6073:
---

 Summary: Misuse of format specifier in Preconditions.checkArgument
 Key: YARN-6073
 URL: https://issues.apache.org/jira/browse/YARN-6073
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Trivial


RMAdminCLI.java

{code}
 int nLabels = map.get(nodeId).size();
  Preconditions.checkArgument(nLabels <= 1, "%d labels specified on host=%s"
  + ", please note that we do not support specifying multiple"
  + " labels on a single host for now.", nLabels, nodeIdStr);
{code}

The {{%d}} should be replaced with {{%s}}, per

https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6068:
-
Priority: Blocker  (was: Critical)

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6068:
-
Target Version/s: 2.8.0  (was: 2.8.1)

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810406#comment-15810406
 ] 

Junping Du commented on YARN-6068:
--

Mark this as blocker for 2.8 as the issue will break NM restart work preserving 
feature. Can someone take a quick look and commit as our RC is almost out of 
the door?

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810487#comment-15810487
 ] 

Varun Saxena commented on YARN-6068:


+1 LGTM.
Will commit it shortly

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5937) stop-yarn.sh is not able to gracefully stop node managers

2017-01-08 Thread Weiwei Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810541#comment-15810541
 ] 

Weiwei Yang commented on YARN-5937:
---

Hello [~Naganarasimha] 

Thanks a lot for looking into this one, any updates?

> stop-yarn.sh is not able to gracefully stop node managers
> -
>
> Key: YARN-5937
> URL: https://issues.apache.org/jira/browse/YARN-5937
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Weiwei Yang
>Assignee: Weiwei Yang
>  Labels: script
> Attachments: YARN-5937.01.patch, nm_shutdown.log
>
>
> stop-yarn.sh always gives following output
> {code}
> ./sbin/stop-yarn.sh
> Stopping resourcemanager
> Stopping nodemanagers
> : WARNING: nodemanager did not stop gracefully after 5 seconds: 
> Trying to kill with kill -9
> : ERROR: Unable to kill 18097
> {code}
> this was because resource manager is stopped before node managers, when the 
> shutdown hook manager tries to gracefully stop NM services, NM needs to 
> unregister with RM, and it gets timeout as NM could not connect to RM 
> (already stopped). See log (stop RM then run kill )
> {code}
> 16/11/28 08:26:43 ERROR nodemanager.NodeManager: RECEIVED SIGNAL 15: SIGTERM
> ...
> 16/11/28 08:26:53 WARN util.ShutdownHookManager: ShutdownHook 
> 'CompositeServiceShutdownHook' timeout, java.util.concurrent.TimeoutException
> java.util.concurrent.TimeoutException
>   at java.util.concurrent.FutureTask.get(FutureTask.java:205)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:67)
> ...
>   at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:291)
> ...
> 16/11/28 08:27:13 ERROR util.ShutdownHookManager: ShutdownHookManger shutdown 
> forcefully.
> {code}
> the shutdown hooker has a default of 10s timeout, so if RM is stopped before 
> NMs, they always took more than 10s to stop (in java code). However 
> stop-yarn.sh only gives 5s timeout, so NM is always killed instead of stopped.
> It would make sense to stop NMs before RMs in this script, in a graceful way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810560#comment-15810560
 ] 

Varun Saxena commented on YARN-6068:


Committed to trunk, branch-2 and branch-2.8
Thanks [~djp] for raising the issue and fixing it.

> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6068) Log aggregation get failed when NM restart even with recovery

2017-01-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810587#comment-15810587
 ] 

Hudson commented on YARN-6068:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11088 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11088/])
YARN-6068. Log aggregation get failed when NM restart even with recovery 
(varunsaxena: rev f59e36b4ce71d3019ab91b136b6d7646316954e7)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java


> Log aggregation get failed when NM restart even with recovery
> -
>
> Key: YARN-6068
> URL: https://issues.apache.org/jira/browse/YARN-6068
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.8.0, 2.9.0, 3.0.0-alpha2
>
> Attachments: YARN-6068-v2.patch, YARN-6068.patch
>
>
> The exception log is as following:
> {noformat}
> 2017-01-05 19:16:36,352 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:abortLogAggregation(527)) - Aborting log 
> aggregation for application_1483640789847_0001
> 2017-01-05 19:16:36,352 WARN  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:run(399)) - Aggregation did not complete for 
> application application_1483640789847_0001
> 2017-01-05 19:16:36,353 WARN  application.ApplicationImpl 
> (ApplicationImpl.java:handle(461)) - Can't handle this event at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> APPLICATION_LOG_HANDLING_FAILED at RUNNING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:459)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:64)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1084)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:1076)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
> at java.lang.Thread.run(Thread.java:745)
> 2017-01-05 19:16:36,355 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:handle(464)) - Application 
> application_1483640789847_0001 transitioned from RUNNING to null
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R reassigned YARN-6072:
---

Assignee: Naganarasimha G R

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> complete. So {{AdminService#server}} will be *null* which causes  

[jira] [Assigned] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Ajith S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S reassigned YARN-6072:
-

Assignee: Ajith S  (was: Naganarasimha G R)

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> complete. So {{AdminService#server}} will be *null* which causes  
> {{AdminServi

[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yuanbo Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810780#comment-15810780
 ] 

Yuanbo Liu commented on YARN-6073:
--

This JIRA can be my good start of first patch for YARN. 
[~yzhangal] Would you mind assigning this JIRA for me, since I don't have the 
privilege to assign YARN JIRA to myself.

> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Trivial
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org




[jira] [Commented] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810798#comment-15810798
 ] 

Yongjun Zhang commented on YARN-6073:
-

Sure [~yuanbo], seems someone need to help to add your name to YARN contributer 
list before your name can be assigned to (I tried and could not).


> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Priority: Trivial
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3774:
---
Priority: Critical  (was: Blocker)

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3774:
---
Issue Type: Improvement  (was: Bug)

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3774:
---
Target Version/s: 3.0.0-alpha2  (was: )

> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3774) ZKRMStateStore should use Curator 3.0 and avail CuratorOp

2017-01-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3774:
---
Description: 
YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
somewhat involved, and could be improved using CuratorOp introduced in Curator 
3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version and make 
this change. 

Curator is considering shading guava through CURATOR-200. In Hadoop 3, we 
should upgrade to the next Curator version.

  was:YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there 
are somewhat involved, and could be improved using CuratorOp introduced in 
Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
and make this change. 


> ZKRMStateStore should use Curator 3.0 and avail CuratorOp
> -
>
> Key: YARN-3774
> URL: https://issues.apache.org/jira/browse/YARN-3774
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
>
> YARN-2716 changes ZKRMStateStore to use Curator. Transactions added there are 
> somewhat involved, and could be improved using CuratorOp introduced in 
> Curator 3.0. Hadoop 3.0.0 would be a good time to upgrade the Curator version 
> and make this change. 
> Curator is considering shading guava through CURATOR-200. In Hadoop 3, we 
> should upgrade to the next Curator version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yuanbo Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu reassigned YARN-6073:


Assignee: Yuanbo Liu

> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
>Priority: Trivial
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6073) Misuse of format specifier in Preconditions.checkArgument

2017-01-08 Thread Yuanbo Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated YARN-6073:
-
Attachment: YARN-6073.001.patch

upload v1 patch for this JIRA.

> Misuse of format specifier in Preconditions.checkArgument
> -
>
> Key: YARN-6073
> URL: https://issues.apache.org/jira/browse/YARN-6073
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Yongjun Zhang
>Assignee: Yuanbo Liu
>Priority: Trivial
> Attachments: YARN-6073.001.patch
>
>
> RMAdminCLI.java
> {code}
>  int nLabels = map.get(nodeId).size();
>   Preconditions.checkArgument(nLabels <= 1, "%d labels specified on 
> host=%s"
>   + ", please note that we do not support specifying multiple"
>   + " labels on a single host for now.", nLabels, nodeIdStr);
> {code}
> The {{%d}} should be replaced with {{%s}}, per
> https://google.github.io/guava/releases/19.0/api/docs/com/google/common/base/Preconditions.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5709) Cleanup leader election configs and pluggability

2017-01-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810832#comment-15810832
 ] 

Sunil G commented on YARN-5709:
---

Looks like EmbeddedElector is started earlier to AdminService and causing 
failure as mentioned in YARN-6072.

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.03.patch, 
> yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, 
> yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810834#comment-15810834
 ] 

Sunil G commented on YARN-6072:
---

+ [~kasha] [~jianhe] too.

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> complete. So {{AdminService#server}} will be *null* 

[jira] [Updated] (YARN-6072) RM unable to start in secure mode

2017-01-08 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-6072:
---
Affects Version/s: 3.0.0-alpha2
   2.8.0

> RM unable to start in secure mode
> -
>
> Key: YARN-6072
> URL: https://issues.apache.org/jira/browse/YARN-6072
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Blocker
> Attachments: hadoop-secureuser-resourcemanager-vm1.log
>
>
> Resource manager is unable to start in secure mode
> {code}
> 2017-01-08 14:27:29,917 INFO org.apache.hadoop.conf.Configuration: found 
> resource hadoop-policy.xml at 
> file:/opt/hadoop/release/hadoop-3.0.0-alpha2-SNAPSHOT/etc/hadoop/hadoop-policy.xml
> 2017-01-08 14:27:29,918 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: Refresh All
> java.lang.NullPointerException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:569)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshServiceAcls(AdminService.java:552)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:707)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,919 ERROR 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService: RefreshAll failed 
> so firing fatal event
> org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2017-01-08 14:27:29,920 INFO org.apache.hadoop.ipc.Server: Starting Socket 
> Reader #1 for port 8033
> 2017-01-08 14:27:29,948 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Exception handling the winning of election
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:888)
> at 
> org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
> at 
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error on refreshAll 
> during transition to Active
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:311)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:142)
> ... 4 more
> Caused by: org.apache.hadoop.ha.ServiceFailedException
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:712)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:302)
> ... 5 more
> {code}
> ResourceManager services are added in following order
> # EmbeddedElector
> # AdminService
> During resource manager service start() .EmbeddedElector starts first and 
> invokes  {{AdminService#refreshAll()}} but {{AdminService#serviceStart()}} 
> happens after {{ActiveStandbyElectorBasedElectorService}} service start is 
> complete. 

[jira] [Created] (YARN-6074) FlowRunEntity does not deserialize long values in efficient way.

2017-01-08 Thread Rohith Sharma K S (JIRA)
Rohith Sharma K S created YARN-6074:
---

 Summary: FlowRunEntity does not deserialize long values in 
efficient way. 
 Key: YARN-6074
 URL: https://issues.apache.org/jira/browse/YARN-6074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelinereader
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S


I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not 
deserialize in efficient way which causes class cast exception based on the 
number.
{code}
  public long getRunId() {
Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY);
return runId == null ? 0L : (Long) runId;
  }
{code} 
and 
{code}
  public long getMaxEndTime() {
Object time = getInfo().get(FLOW_RUN_END_TIME);
return time == null ? 0L : (Long)time;
  }
{code} 

The reason for class caste exception is Json has data type Number which 
includes all java primitive types. So, if number with in the range of Integer 
max, then Object is converted to Integer which fails  to type cast to Long. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6074) FlowRunEntity does not deserialize long values in efficient way.

2017-01-08 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-6074:

Attachment: YARN-6074.patch

Updated patch by type casting to Number and getting long value. 

> FlowRunEntity does not deserialize long values in efficient way. 
> -
>
> Key: YARN-6074
> URL: https://issues.apache.org/jira/browse/YARN-6074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6074.patch
>
>
> I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not 
> deserialize in efficient way which causes class cast exception based on the 
> number.
> {code}
>   public long getRunId() {
> Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY);
> return runId == null ? 0L : (Long) runId;
>   }
> {code} 
> and 
> {code}
>   public long getMaxEndTime() {
> Object time = getInfo().get(FLOW_RUN_END_TIME);
> return time == null ? 0L : (Long)time;
>   }
> {code} 
> The reason for class caste exception is Json has data type Number which 
> includes all java primitive types. So, if number with in the range of Integer 
> max, then Object is converted to Integer which fails  to type cast to Long. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6060) Linux container executor fails to run container on directories mounted as noexec

2017-01-08 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810878#comment-15810878
 ] 

Varun Vasudev commented on YARN-6060:
-

[~miklos.szeg...@cloudera.com] - the patch attached will make debugging 
failures a lot harder. For example, Slider localizes the users application code 
and then launches the application. With your patch in place, the slider agent 
will get launched and then subsequent launches will fail. To make things worse, 
MR apps will run fine. Today, all apps will fail making it easier to debug. 

I agree with [~aw] - directories mounted with noexec is a bad configuration.

> Linux container executor fails to run container on directories mounted as 
> noexec
> 
>
> Key: YARN-6060
> URL: https://issues.apache.org/jira/browse/YARN-6060
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager, yarn
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
> Attachments: YARN-6060.000.patch, YARN-6060.001.patch
>
>
> If node manager directories are mounted as noexec, LCE fails with the 
> following error:
> Launching container...
> Couldn't execute the container launch file 
> /tmp/hadoop-/nm-local-dir/usercache//appcache/application_1483656052575_0001/container_1483656052575_0001_02_01/launch_container.sh
>  - Permission denied



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6074) FlowRunEntity does not deserialize long values in efficient way.

2017-01-08 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810884#comment-15810884
 ] 

Varun Saxena commented on YARN-6074:


I think this can go in trunk. Right ?

> FlowRunEntity does not deserialize long values in efficient way. 
> -
>
> Key: YARN-6074
> URL: https://issues.apache.org/jira/browse/YARN-6074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6074.patch
>
>
> I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not 
> deserialize in efficient way which causes class cast exception based on the 
> number.
> {code}
>   public long getRunId() {
> Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY);
> return runId == null ? 0L : (Long) runId;
>   }
> {code} 
> and 
> {code}
>   public long getMaxEndTime() {
> Object time = getInfo().get(FLOW_RUN_END_TIME);
> return time == null ? 0L : (Long)time;
>   }
> {code} 
> The reason for class caste exception is Json has data type Number which 
> includes all java primitive types. So, if number with in the range of Integer 
> max, then Object is converted to Integer which fails  to type cast to Long. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6074) FlowRunEntity does not deserialize long values in efficient way.

2017-01-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810921#comment-15810921
 ] 

Hadoop QA commented on YARN-6074:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
25s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 20m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | YARN-6074 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12846268/YARN-6074.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 215da7e16072 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / f59e36b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/14605/testReport/ |
| modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/14605/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> FlowRunEntity does not deserialize long values in efficient way. 
> -
>
> Key: YARN-6074
> URL: https://issues.apache.org/jira/browse/YARN-6074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6074.patch
>
>
> I see

[jira] [Updated] (YARN-5709) Cleanup leader election configs and pluggability

2017-01-08 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-5709:
-
Fix Version/s: (was: 2.9.0)
   2.8.0

> Cleanup leader election configs and pluggability
> 
>
> Key: YARN-5709
> URL: https://issues.apache.org/jira/browse/YARN-5709
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: yarn-5709-branch-2.8.01.patch, 
> yarn-5709-branch-2.8.02.patch, yarn-5709-branch-2.8.03.patch, 
> yarn-5709-branch-2.8.patch, yarn-5709-wip.2.patch, yarn-5709.1.patch, 
> yarn-5709.2.patch, yarn-5709.3.patch, yarn-5709.4.patch
>
>
> While reviewing YARN-5677 and YARN-5694, I noticed we could make the 
> curator-based election code cleaner. It is nicer to get this fixed in 2.8 
> before we ship it, but this can be done at a later time as well. 
> # By EmbeddedElector, we meant it was running as part of the RM daemon. Since 
> the Curator-based elector is also running embedded, I feel the code should be 
> checking for {{!curatorBased}} instead of {{isEmbeddedElector}}
> # {{LeaderElectorService}} should probably be named 
> {{CuratorBasedEmbeddedElectorService}} or some such.
> # The code that initializes the elector should be at the same place 
> irrespective of whether it is curator-based or not. 
> # We seem to be caching the CuratorFramework instance in RM. It makes more 
> sense for it to be in RMContext. If others are okay with it, we might even be 
> better of having {{RMContext#getCurator()}} method to lazily create the 
> curator framework and then cache it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6074) FlowRunEntity does not deserialize long values in efficient way.

2017-01-08 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810953#comment-15810953
 ] 

Rohith Sharma K S commented on YARN-6074:
-

bq. I think this can go in trunk. Right ?
Yup.. Need to be committed in branch yarn-5355 as well. 

> FlowRunEntity does not deserialize long values in efficient way. 
> -
>
> Key: YARN-6074
> URL: https://issues.apache.org/jira/browse/YARN-6074
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelinereader
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
> Attachments: YARN-6074.patch
>
>
> I see that FlowRunEntity methods *getRunId()* and *getMaxEndTime()* does not 
> deserialize in efficient way which causes class cast exception based on the 
> number.
> {code}
>   public long getRunId() {
> Object runId = getInfo().get(FLOW_RUN_ID_INFO_KEY);
> return runId == null ? 0L : (Long) runId;
>   }
> {code} 
> and 
> {code}
>   public long getMaxEndTime() {
> Object time = getInfo().get(FLOW_RUN_END_TIME);
> return time == null ? 0L : (Long)time;
>   }
> {code} 
> The reason for class caste exception is Json has data type Number which 
> includes all java primitive types. So, if number with in the range of Integer 
> max, then Object is converted to Integer which fails  to type cast to Long. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6050) AMs can't be scheduled on racks or nodes

2017-01-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810963#comment-15810963
 ] 

Wangda Tan commented on YARN-6050:
--

[~rkanter], thanks for explanation.

I'm agree with most of your points. 

However, even if ApplicationMasterProtocol is used as internal protocol, 
however by definition it is still a public API. So make it can be used without 
surprises is very important. 

My concerns are:  adding hard locality may be conflict with AM blacklisting 
behavior, and we need to highlight the hard locality to end user so customers 
can easily troubleshooting why applications are get stuck. 

I think the direction of the JIRA is correct: application should have choices 
to specify requirements of AM container placement, but since AM container is 
specifically important, we need to be very careful about this.

Just my two cents. I think it's better to get more inputs before proceeding. We 
can discuss more after your vacation. :) And have a nice vacation!

> AMs can't be scheduled on racks or nodes
> 
>
> Key: YARN-6050
> URL: https://issues.apache.org/jira/browse/YARN-6050
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0, 3.0.0-alpha2
>Reporter: Robert Kanter
>Assignee: Robert Kanter
> Attachments: YARN-6050.001.patch, YARN-6050.002.patch, 
> YARN-6050.003.patch, YARN-6050.004.patch, YARN-6050.005.patch
>
>
> Yarn itself supports rack/node aware scheduling for AMs; however, there 
> currently are two problems:
> # To specify hard or soft rack/node requests, you have to specify more than 
> one {{ResourceRequest}}.  For example, if you want to schedule an AM only on 
> "rackA", you have to create two {{ResourceRequest}}, like this:
> {code}
> ResourceRequest.newInstance(PRIORITY, ANY, CAPABILITY, NUM_CONTAINERS, false);
> ResourceRequest.newInstance(PRIORITY, "rackA", CAPABILITY, NUM_CONTAINERS, 
> true);
> {code}
> The problem is that the Yarn API doesn't actually allow you to specify more 
> than one {{ResourceRequest}} in the {{ApplicationSubmissionContext}}.  The 
> current behavior is to either build one from {{getResource}} or directly from 
> {{getAMContainerResourceRequest}}, depending on if 
> {{getAMContainerResourceRequest}} is null or not.  We'll need to add a third 
> method, say {{getAMContainerResourceRequests}}, which takes a list of 
> {{ResourceRequest}} so that clients can specify the multiple resource 
> requests.
> # There are some places where things are hardcoded to overwrite what the 
> client specifies.  These are pretty straightforward to fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org