Xiaolin Ha created HBASE-25507: ---------------------------------- Summary: Leak of ESTABLISHED sockets when encountered "java.io.IOException: Invalid HFile block magic" Key: HBASE-25507 URL: https://issues.apache.org/jira/browse/HBASE-25507 Project: HBase Issue Type: Improvement Reporter: Xiaolin Ha Assignee: Xiaolin Ha Attachments: errorlogs.png, increasing-of-established-sockets-image.png, problem-region-move-logs.png
Recently, we found socket leaks on our production cluster. The leaked sockets are in ESTABLISHED state. We found this happened on RS who owned a particular region from our analysis of metrics monitor and logs. RS without this region works normally. On the RS who owns the particular region, we found Exceptions as follows, {code:java} java.io.IOException: java.io.IOException: Could not seek StoreFileScanner[org.apache.hadoop.hbase.ioo.HalfStoreFileReader$1@5388be2f, cur=null] to key org.apache.hadoop.hbase.CellUtil$FirstOnRowDeleteFFamilyCell@25aa56fd at org.apache.hadoop.hbase.regionserver.StoreScanner.parallelSeek(StoreScanner.java:1128) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:437) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:329) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:302) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:8806) at org.apache.hadoop.hbase.regionserver.compactions.StripeCompactor$StripeInternalScannerFacctory.createScanner(StripeCompactor.java:82) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:316) at org.apache.hadoop.hbase.regionserver.compactions.StripeCompactor.compact(StripeCompactor..java:120) at org.apache.hadoop.hbase.regionserver.compactions.StripeCompactionPolicy$SplitStripeCompacctionRequest.execute(StripeCompactionPolicy.java:662) at org.apache.hadoop.hbase.regionserver.StripeStoreEngine$StripeCompaction.compact(StripeStooreEngine.java:114) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1461) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2121) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CommpactSplitThread.java:519) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplittThread.java:555) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)Caused by: java.io.IOException: Could not seek StoreFileScanner[org.apache.hadoop.hbase.io.HalfStoreeFileReader$1@5388be2f, cur=null] to key org.apache.hadoop.hbase.CellUtil$FirstOnRowDeleteFamilyCell@@25aa56fd at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:229) at org.apache.hadoop.hbase.regionserver.handler.ParallelSeekHandler.process(ParallelSeekHanddler.java:56) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) ... 3 moreCaused by: java.io.IOException: Invalid HFile block magic: \x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.BlockType.parse(BlockType.java:159) at org.apache.hadoop.hbase.io.hfile.BlockType.read(BlockType.java:171) at org.apache.hadoop.hbase.io.hfile.HFileBlock.createFromBuff(HFileBlock.java:333) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlockk.java:1753) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:15552) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:539) at org.apache.hadoop.hbase.io.hfile.HFileScannerImpl.readAndUpdateNewBlock(HFileScannerImpl..java:737) at org.apache.hadoop.hbase.io.hfile.HFileScannerImpl.seekTo(HFileScannerImpl.java:726) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:161) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.javaa:315) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:211) ... 5 more {code} The count of established sockets is always increasing, see picture, !increasing-of-established-sockets-image.png! !problem-region-move-logs.png! !errorlogs.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)