Wei-Chiu Chuang created HDFS-16161: -------------------------------------- Summary: Corrupt block checksum is not reported to NameNode Key: HDFS-16161 URL: https://issues.apache.org/jira/browse/HDFS-16161 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Wei-Chiu Chuang
One of our user reported this error in the log: {noformat} 2021-07-30 09:51:27,509 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: an02nphda5777.npa.bfsiplatform.com:1004:DataXceiver error processing READ_BLOCK operation src: /10.30.10.68:35680 dst: /10.30.10.67:1004 java.lang.IllegalArgumentException: id=-46 out of range [0, 5) at org.apache.hadoop.util.DataChecksum$Type.valueOf(DataChecksum.java:76) at org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:167) {noformat} Analysis: it looks like the first few bytes of checksum was bad. The first few bytes determines the type of checksum (CRC32, CRC32C…etc). if DN throws an IOException reading a block, it starts another thread to scan the block. If the block is indeed bad, it tells NN it’s got a bad block. But this is an IllegalArgumentException which is a RuntimeException not an IOE so it’s not handled that way. its’ a bug in the error handling code. It should be made more graceful. Suggest: catch the IllegalArgumentException in BlockMetadataHeader.preadHeader() and throw CorruptMetaHeaderException, so that DN catches the exception and perform the regular block scan check. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org