[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager
[ https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306206#comment-15306206 ] tu nguyen khac commented on YARN-5178: -- [~hex108] tks for your helping , i will try this patch > yarn application never can be killed when failover resource manager > --- > > Key: YARN-5178 > URL: https://issues.apache.org/jira/browse/YARN-5178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: tu nguyen khac >Priority: Minor > Attachments: rs1.zip, rs2.zip > > > Dear all > problem i detected is that : > In my cluster enviroment ( 16 nodes , 2 ResourceManager , HA ) > When an application are submitted in resource manager (Rs ) 1st , suddenly > that Rs1 machine are hang , this application is failover to Rs2 but it never > can be run : > Name: cpaBidEcom > Application Type: SPARK > Application Tags: > State:ACCEPTED > FinalStatus: UNDEFINED > Started: 28-May-2016 01:46:13 > Elapsed: 7hrs, 35mins, 32sec > Tracking URL: UNASSIGNED > after that our developer try to kill this application by command : > yarn application -kill app_ > we retried this output forever : > 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > I think it probably a bug . It 's hard to reproduce it but please review it > for me -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager
[ https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305329#comment-15305329 ] Jun Gong commented on YARN-5178: Thanks [~tuyuri] for sharing the logs. I analyzed rs2 log, and found it is a same problem as YARN-2856: the application was at accepted state, and ignored event {{RMAppEventType.ATTEMPT_KILLED}}. You could try the patch. > yarn application never can be killed when failover resource manager > --- > > Key: YARN-5178 > URL: https://issues.apache.org/jira/browse/YARN-5178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: tu nguyen khac >Priority: Minor > Attachments: rs1.zip, rs2.zip > > > Dear all > problem i detected is that : > In my cluster enviroment ( 16 nodes , 2 ResourceManager , HA ) > When an application are submitted in resource manager (Rs ) 1st , suddenly > that Rs1 machine are hang , this application is failover to Rs2 but it never > can be run : > Name: cpaBidEcom > Application Type: SPARK > Application Tags: > State:ACCEPTED > FinalStatus: UNDEFINED > Started: 28-May-2016 01:46:13 > Elapsed: 7hrs, 35mins, 32sec > Tracking URL: UNASSIGNED > after that our developer try to kill this application by command : > yarn application -kill app_ > we retried this output forever : > 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > I think it probably a bug . It 's hard to reproduce it but please review it > for me -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager
[ https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305163#comment-15305163 ] tu nguyen khac commented on YARN-5178: -- Sorry Jun Gong, it 's my mistake , i didn't stop cluster to get log , and so many other application ran , RS log is quite chaos :D :D , it 's make too hard to reading log , but i try to attached here > yarn application never can be killed when failover resource manager > --- > > Key: YARN-5178 > URL: https://issues.apache.org/jira/browse/YARN-5178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: tu nguyen khac >Priority: Minor > Attachments: rs1.zip > > > Dear all > problem i detected is that : > In my cluster enviroment ( 16 nodes , 2 ResourceManager , HA ) > When an application are submitted in resource manager (Rs ) 1st , suddenly > that Rs1 machine are hang , this application is failover to Rs2 but it never > can be run : > Name: cpaBidEcom > Application Type: SPARK > Application Tags: > State:ACCEPTED > FinalStatus: UNDEFINED > Started: 28-May-2016 01:46:13 > Elapsed: 7hrs, 35mins, 32sec > Tracking URL: UNASSIGNED > after that our developer try to kill this application by command : > yarn application -kill app_ > we retried this output forever : > 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > I think it probably a bug . It 's hard to reproduce it but please review it > for me -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5178) yarn application never can be killed when failover resource manager
[ https://issues.apache.org/jira/browse/YARN-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305129#comment-15305129 ] Jun Gong commented on YARN-5178: Thanks [~tuyuri] for reporting the issue. Could you please upload two RMs logs if it is possible? It seems caused by that the RMApp was in ACCEPTED state and RM HA started before none of RMAppAttempt was saved. > yarn application never can be killed when failover resource manager > --- > > Key: YARN-5178 > URL: https://issues.apache.org/jira/browse/YARN-5178 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: tu nguyen khac >Priority: Minor > > Dear all > problem i detected is that : > In my cluster enviroment ( 16 nodes , 2 ResourceManager , HA ) > When an application are submitted in resource manager (Rs ) 1st , suddenly > that Rs1 machine are hang , this application is failover to Rs2 but it never > can be run : > Name: cpaBidEcom > Application Type: SPARK > Application Tags: > State:ACCEPTED > FinalStatus: UNDEFINED > Started: 28-May-2016 01:46:13 > Elapsed: 7hrs, 35mins, 32sec > Tracking URL: UNASSIGNED > after that our developer try to kill this application by command : > yarn application -kill app_ > we retried this output forever : > 16/05/28 09:24:48 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:50 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:52 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:54 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:56 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:24:58 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:00 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:02 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:04 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:06 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:08 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:10 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:12 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:14 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > 16/05/28 09:25:16 INFO impl.YarnClientImpl: Waiting for application > application_1464374175189_0016 to be killed. > I think it probably a bug . It 's hard to reproduce it but please review it > for me -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org