[ https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Giovanni Matteo Fumarola reassigned YARN-8337: ---------------------------------------------- Assignee: (was: Giovanni Matteo Fumarola) > [FederationStateStore - MySql] Deadlock In addApplicationHome > ------------------------------------------------------------- > > Key: YARN-8337 > URL: https://issues.apache.org/jira/browse/YARN-8337 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router > Reporter: Jianchao Jia > Priority: Major > Attachments: YARN-8337.001.patch, YARN-8337.002.patch > > > We use mysql innodb as the state store engine,in router log we found dead > lock error like below: > {code:java} > [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : > Unable to insert the newly generated application > application_1526295230627_127402 > com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock > found when trying to get lock; try restarting transaction > at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) > at com.mysql.jdbc.Util.getInstance(Util.java:408) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) > at > com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418) > at > com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887) > at > com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) > at > com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java) > at > org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547) > {code} > Use "show engine innodb status;" command to find what happens > {code:java} > 2018-05-21 15:41:40 7f4685870700 > *** (1) TRANSACTION: > TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (1) WAITING FOR THIS LOCK TO BE GRANTED: > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 > lock_mode X locks gap before rec insert intention waiting > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 00000ba5f32d; asc -;; > 2: len 7; hex dd000000280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** (2) TRANSACTION: > TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (2) HOLDS THE LOCK(S): > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 > lock mode S locks gap before rec > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 00000ba5f32d; asc -;; > 2: len 7; hex dd000000280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** (2) WAITING FOR THIS LOCK TO BE GRANTED: > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 > lock_mode X locks gap before rec insert intention waiting > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 00000ba5f32d; asc -;; > 2: len 7; hex dd000000280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** WE ROLL BACK TRANSACTION (2) > {code} > PROCEDURE sp_addApplicationHomeSubCluster will create gap lock . > insert into select where clause,if the applicationId does not exists in table > applicationsHomeSubCluster ,it will create a gap lock。 > At this moment ,if other threads want to insert new records of the > gap,deaklock may happen. > > To reproduce the situation , we use 3 sessions to insert different > applicationId > :application_1526528662010_001201,application_1526528662010_001202,application_1526528662010_001203 > !http://bit.jd.com/zhangmang/JDHadoop-2.7.1/uploads/78912e6d245b8009052fb24e73cbaf54/image.png! > > To fix this issue,we should use insert ignore into instead of insert into > select > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org