Wei-Chiu Chuang created HDFS-16161:
--------------------------------------

             Summary: Corrupt block checksum is not reported to NameNode
                 Key: HDFS-16161
                 URL: https://issues.apache.org/jira/browse/HDFS-16161
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
            Reporter: Wei-Chiu Chuang


One of our user reported this error in the log:

{noformat}
2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK 
operation  src: /10.30.10.68:35680 dst: /10.30.10.67:1004
java.lang.IllegalArgumentException: id=-46 out of range [0, 5)
        at 
org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76)
        at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167)
{noformat}

Analysis:
it looks like the first few bytes of checksum was bad. The first few bytes 
determines the type of checksum (CRC32, CRC32C…etc).

if DN throws an IOException reading a block, it starts another thread to scan 
the block. If the block is indeed bad, it tells NN it’s got a bad block. But 
this is an IllegalArgumentException which is a RuntimeException not an IOE so 
it’s not handled that way.

its’ a bug in the error handling code. It should be made more graceful.

Suggest: catch the IllegalArgumentException in 
BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that 
DN catches the exception and perform the regular block scan check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to