[ https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352591#comment-14352591 ]
Vinayakumar B edited comment on HDFS-7884 at 3/9/15 6:52 AM: ------------------------------------------------------------- Earlier comment was a quick guess. The actual reason is, reader is trying to read using the old block Id (blk_1073741911_1102) But the original block's gs was modified after appending. {noformat}2015-03-09 02:03:21,488 INFO impl.FsDatasetImpl (FsDatasetImpl.java:append(1015)) - Appending to FinalizedReplica, blk_1073741911_1102, FINALIZED{noformat} {noformat}2015-03-09 02:03:21,501 INFO namenode.FSNamesystem (FSNamesystem.java:updatePipeline(6199)) - updatePipeline(blk_1073741911_1102, newGS=1110, newLength=638, newNodes=[127.0.0.1:52069, 127.0.0.1:52065, 127.0.0.1:52074], client=DFSClient_NONMAPREDUCE_-727094507_1){noformat} For initializing the BlockSender, while getting the replica, GS was not checked. {code} synchronized(datanode.data) { replica = getReplica(block, datanode); replicaVisibleLength = replica.getVisibleLength(); }{code} GS was checked against client-passed GS only for latest. i.e. If client is latest and DN have old, then only throw exception. Othercase it should support read according to code. {code} if (replica.getGenerationStamp() < block.getGenerationStamp()) { throw new IOException("Replica gen stamp < block genstamp, block=" + block + ", replica=" + replica); }{code} But while getting the Volume reference it will be checked down the line in ReplicaMap#get {code} ReplicaInfo get(String bpid, Block block) { checkBlockPool(bpid); checkBlock(block); ReplicaInfo replicaInfo = get(bpid, block.getBlockId()); if (replicaInfo != null && block.getGenerationStamp() == replicaInfo.getGenerationStamp()) { return replicaInfo; } return null; }{code} So I think, In this case, If client-read needs to go through, then need to bump up the genstamp to latest, like below. {code} if (replica.getGenerationStamp() < block.getGenerationStamp()) { throw new IOException("Replica gen stamp < block genstamp, block=" + block + ", replica=" + replica); } else if (replica.getGenerationStamp() > block.getGenerationStamp()) { DataNode.LOG.debug("Bumping up the client provided" + " block's genstamp to latest " + replica.getGenerationStamp() + " for block " + block); block.setGenerationStamp(replica.getGenerationStamp()); }{code} Else needs to throw exception from here itself. Any thoughts.? was (Author: vinayrpet): I think, its just a race between client-read and delete of the block. Safe option is to null-check and throw ReplicaNotFoundException {code} // Obtain a reference before reading data FsVolumeSpi volume = datanode.data.getVolume(block); if (volume == null) { // This is race b/n delete and read throw new ReplicaNotFoundException(block); } this.volumeRef = volume.obtainReference();{code} > NullPointerException in BlockSender > ----------------------------------- > > Key: HDFS-7884 > URL: https://issues.apache.org/jira/browse/HDFS-7884 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Tsz Wo Nicholas Sze > Assignee: Brahma Reddy Battula > Priority: Blocker > Attachments: > org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt > > > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:264) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249) > at java.lang.Thread.run(Thread.java:745) > {noformat} > BlockSender.java:264 is shown below > {code} > this.volumeRef = datanode.data.getVolume(block).obtainReference(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)