[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell reopened HBASE-17069: ------------------------------------ > RegionServer writes invalid META entries for split daughters in some > circumstances > ---------------------------------------------------------------------------------- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug > Components: wal > Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.4 > Reporter: Andrew Purtell > Assignee: Abhishek Singh Chouhan > Priority: Blocker > Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5 > > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, > HBASE-17069.branch-1.3.001.patch, HBASE-17069.branch-1.3.002.patch, > HBASE-17069.master.001.patch, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > {noformat} > *The RegionServer onlines it* > {noformat} > 2016-11-08 04:23:27,015 INFO > [node-3.cluster,16020,1478578389506-daughterOpener=d55ef81c2f8299abbddfce0445067830] > regionserver.HRegion: Onlined d55ef81c2f8299abbddfce0445067830; next > sequenceid=19184 > 2016-11-08 04:23:27,029 INFO > [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] > regionserver.HRegionServer: Post open deploy tasks for > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. > 2016-11-08 04:23:27,047 INFO > [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] > hbase.MetaTableAccessor: Updated row > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. > with server=node-3.cluster,16020,1478578389506 > {noformat} > *The Master transitions state from SPLITTING_NEW to OPEN* > {noformat} > 2016-11-08 04:23:27,058 INFO [AM.ZK.Worker-pool2-t84] master.RegionStates: > Transition {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579007057, server=node-3.cluster,16020,1478578389506} to > {d55ef81c2f8299abbddfce0445067830 state=OPEN, ts=1478579007058, > server=node-3.cluster,16020,1478578389506} > 2016-11-08 04:23:27,059 INFO [AM.ZK.Worker-pool2-t84] > master.AssignmentManager: Handled SPLIT event; > parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., > daughter > a=IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., > daughter > b=IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb., > on node-3.cluster,16020,1478578389506 > {noformat} > *RegionServer updates META - BUT APPARENTLY NOT CORRECTLY* > {noformat} > 2016-11-08 04:23:27,165 INFO > [regionserver/node-3.cluster/192.168.124.4:16020-splits-1478579001099] > regionserver.SplitRequest: Region split, hbase:meta updated, and report to > master. > Parent=IntegrationTestBigLinkedList,,1478577020916.393d2bfd8b1c52ce08540306659624f2., > new regions: > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830., > > IntegrationTestBigLinkedList,/\xFB\x14,1478579001155.08629d59564726da2497f70451aafcdb.. > Split took 6sec > {noformat} > *RegionServer delays flush* > (Is this important?) > {noformat} > 2016-11-08 04:24:14,639 WARN [MemStoreFlusher.0] > regionserver.MemStoreFlusher: Region > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830. > has too many store files; delaying flush up to 90000ms > {noformat} > *Immediate warnings about No serialized HRegionInfo* > {noformat} > 2016-11-08 04:24:44,691 WARN > [B.defaultRpcServer.handler=26,queue=2,port=16000] hbase.MetaTableAccessor: > No serialized HRegionInfo in > keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0} > {noformat} > *Master is not happy either* > {noformat} > 2016-11-08 04:24:51,148 WARN [MASTER_TABLE_OPERATIONS-node-1:16000-0] > hbase.MetaTableAccessor: No serialized HRegionInfo in > keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478579007029/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478579007029/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478579007029/Put/vlen=8/seqid=0} > {noformat} > *TestRunner MetaScanner complains about invalid entries in META missing > HRegionInfo* > {noformat} > (standard input):9086:2016-11-08 05:04:17,230 WARN > [B.defaultRpcServer.handler=4,queue=1,port=16000] hbase.MetaTableAccessor: No > serialized HRegionInfo in > keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} > *ITBLL MapReduce tasks fail because part of the keyspace cannot be located:* > {noformat} > java.io.IOException: HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1293) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1185) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:154) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:121) > at > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.persist(IntegrationTestBigLinkedList.java:486) > at > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:431) > at > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator$GeneratorMapper.map(IntegrationTestBigLinkedList.java:375) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000008/syslog:920:java.io.IOException: > HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000010/syslog:920:java.io.IOException: > HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000011/syslog:909:java.io.IOException: > HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478580288482/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478580288482/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478580288482/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000030/syslog:48:java.io.IOException: > HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} > {noformat} > ./application_1478574724776_0002/container_1478574724776_0002_01_000048/syslog:48:java.io.IOException: > HRegionInfo was null in IntegrationTestBigLinkedList, > row=keyvalues={IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:seqnumDuringOpen/1478581041080/Put/vlen=8/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:server/1478581041080/Put/vlen=20/seqid=0, > > IntegrationTestBigLinkedList,,1478579001155.d55ef81c2f8299abbddfce0445067830./info:serverstartcode/1478581041080/Put/vlen=8/seqid=0} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)