Hi all, 向社区求助一个HBase的scan异常导致RS宕机的问题,异常日志如下:
2023-01-19 03:19:06,986 ERROR [RpcServer.default.RWQ.Fifo.scan.handler=226,queue=19,port=60020] ipc.RpcServer: Unexpected throwable object java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:bbec4ed53b6d475cbb8711f183556eb0_14145152 at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:205) at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:237) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:432) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:417) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:403) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:68) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1528) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1526) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readNextDataBlock(HFileReaderImpl.java:928) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.isNextBlock(HFileReaderImpl.java:1061) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.positionForNextBlock(HFileReaderImpl.java:1055) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl._next(HFileReaderImpl.java:1073) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.next(HFileReaderImpl.java:1094) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:244) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:324) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:267) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:1099) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:1088) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekOrSkipToNextColumn(StoreScanner.java:823) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:730) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:157) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:6681) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6845) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:6615) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3238) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3483) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42278) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:379) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318) 对应的源码(2.2.6): public static int validateBlockAddition(Cacheable existing, Cacheable newBlock, BlockCacheKey cacheKey) { int comparison = compareCacheBlock(existing, newBlock, false); if (comparison != 0) { throw new RuntimeException( "Cached block contents differ, which should not have happened." + "cacheKey:" + cacheKey); } if ((existing instanceof HFileBlock) && (newBlock instanceof HFileBlock)) { comparison = ((HFileBlock) existing).getNextBlockOnDiskSize() - ((HFileBlock) newBlock).getNextBlockOnDiskSize(); } return comparison; } scan操作的缓存置换过程中,触发了一个未捕获的RunTime异常,导致RS突然宕机,请教一下,什么情况会触发此异常,然后如何避免此异常的发生呢?