I found the cause of the error. Because of a corrupt block. However, the length of the corrupt block and the GS are the same as a normal Block. Therefore, HDFS cannot recognize that block as a corrupt block. I excluded the datanode that had a corrupt block.
Thanks. 2021년 6월 28일 (월) 오후 2:02, Minwoo Kang <[email protected]>님이 작성: > Hello, > > I met a strange issue. However, I don't understand why it occurred. > That is "On-disk size without header provided is 65347, but block header > contains -620432417. Block offset: 117193180315, data starts with:". > > Call stack: > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:521) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:88) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1671) > at > org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1538) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:452) > > The HBase version is 1.2.6 > I know 1.2.6 reached EOL. > (However, it is too hard migration to the new cluster. That the reason why > I operate this cluster.) > > It looks like a block is wrong. When it occurred, Every read request that > related to the error block didn't complete. > I don't know how to resolve this. The only way to resolve this issue is > major compaction. > After major compaction (means a block that looks wrong becomes an invalid > block), a Read request works fine. > > I found an issue https://issues.apache.org/jira/browse/HBASE-20761. > I am not sure it is related. > But HBASE-20761 mentioned "This will cause further attempts to read the > block to fail since we will still retry the corrupt replica instead of > reporting the corrupt replica and trying a different one.", this looks like > a key to solve this issue. > > Our cluster configuration > - hbase.regionserver.checksum.verify=true > - dfs.client.read.shortcircuit.skip.checksum=false > > Does anyone have a similar situation? > > Thanks. > >
