[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager

2016-05-29 Thread tu nguyen khac (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306206#comment-15306206
 ] 

tu nguyen khac commented on YARN-5178:
--

[~hex108] tks for your helping , i will try this patch 

> yarn application never can be killed when failover resource manager
> ---
>
> Key: YARN-5178
> URL: https://issues.apache.org/jira/browse/YARN-5178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: tu nguyen khac
>Priority: Minor
> Attachments: rs1.zip, rs2.zip
>
>
> Dear all 
> problem i detected is that : 
> In my cluster enviroment ( 16 nodes , 2 ResourceManager  , HA ) 
> When an application are submitted in resource manager (Rs )  1st , suddenly 
> that Rs1 machine are hang , this application is failover to Rs2 but it never 
> can be run : 
> Name: cpaBidEcom
> Application Type: SPARK
> Application Tags: 
> State:ACCEPTED
> FinalStatus:  UNDEFINED
> Started:  28-May-2016 01:46:13
> Elapsed:  7hrs, 35mins, 32sec
> Tracking URL: UNASSIGNED
> after that our developer try to kill this application by command : 
> yarn application -kill app_
> we retried this output forever : 
> 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> I think it probably a bug . It 's hard to reproduce it but please review it 
> for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager

2016-05-28 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305329#comment-15305329
 ] 

Jun Gong commented on YARN-5178:


Thanks [~tuyuri] for sharing the logs.  I analyzed rs2 log, and found it is a 
same problem as YARN-2856: the application was at accepted state, and ignored 
event {{RMAppEventType.ATTEMPT_KILLED}}. You could try the patch.

> yarn application never can be killed when failover resource manager
> ---
>
> Key: YARN-5178
> URL: https://issues.apache.org/jira/browse/YARN-5178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: tu nguyen khac
>Priority: Minor
> Attachments: rs1.zip, rs2.zip
>
>
> Dear all 
> problem i detected is that : 
> In my cluster enviroment ( 16 nodes , 2 ResourceManager  , HA ) 
> When an application are submitted in resource manager (Rs )  1st , suddenly 
> that Rs1 machine are hang , this application is failover to Rs2 but it never 
> can be run : 
> Name: cpaBidEcom
> Application Type: SPARK
> Application Tags: 
> State:ACCEPTED
> FinalStatus:  UNDEFINED
> Started:  28-May-2016 01:46:13
> Elapsed:  7hrs, 35mins, 32sec
> Tracking URL: UNASSIGNED
> after that our developer try to kill this application by command : 
> yarn application -kill app_
> we retried this output forever : 
> 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> I think it probably a bug . It 's hard to reproduce it but please review it 
> for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager

2016-05-27 Thread tu nguyen khac (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305163#comment-15305163
 ] 

tu nguyen khac commented on YARN-5178:
--

Sorry Jun Gong, it 's my mistake , i didn't stop cluster to get log , and so 
many other application ran , RS log is quite chaos :D :D ,  it 's make too hard 
to reading log , but i try to attached here 

> yarn application never can be killed when failover resource manager
> ---
>
> Key: YARN-5178
> URL: https://issues.apache.org/jira/browse/YARN-5178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: tu nguyen khac
>Priority: Minor
> Attachments: rs1.zip
>
>
> Dear all 
> problem i detected is that : 
> In my cluster enviroment ( 16 nodes , 2 ResourceManager  , HA ) 
> When an application are submitted in resource manager (Rs )  1st , suddenly 
> that Rs1 machine are hang , this application is failover to Rs2 but it never 
> can be run : 
> Name: cpaBidEcom
> Application Type: SPARK
> Application Tags: 
> State:ACCEPTED
> FinalStatus:  UNDEFINED
> Started:  28-May-2016 01:46:13
> Elapsed:  7hrs, 35mins, 32sec
> Tracking URL: UNASSIGNED
> after that our developer try to kill this application by command : 
> yarn application -kill app_
> we retried this output forever : 
> 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> I think it probably a bug . It 's hard to reproduce it but please review it 
> for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager

2016-05-27 Thread Jun Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305129#comment-15305129
 ] 

Jun Gong commented on YARN-5178:


Thanks [~tuyuri] for reporting the issue. Could you please upload two RMs logs 
if it is possible? It seems caused by that the RMApp was in ACCEPTED state and 
RM HA started before none of RMAppAttempt was saved.

> yarn application never can be killed when failover resource manager
> ---
>
> Key: YARN-5178
> URL: https://issues.apache.org/jira/browse/YARN-5178
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: tu nguyen khac
>Priority: Minor
>
> Dear all 
> problem i detected is that : 
> In my cluster enviroment ( 16 nodes , 2 ResourceManager  , HA ) 
> When an application are submitted in resource manager (Rs )  1st , suddenly 
> that Rs1 machine are hang , this application is failover to Rs2 but it never 
> can be run : 
> Name: cpaBidEcom
> Application Type: SPARK
> Application Tags: 
> State:ACCEPTED
> FinalStatus:  UNDEFINED
> Started:  28-May-2016 01:46:13
> Elapsed:  7hrs, 35mins, 32sec
> Tracking URL: UNASSIGNED
> after that our developer try to kill this application by command : 
> yarn application -kill app_
> we retried this output forever : 
> 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application 
> application_1464374175189_0016 to be killed.
> I think it probably a bug . It 's hard to reproduce it but please review it 
> for me



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org