[ 
https://issues.apache.org/jira/browse/HDFS-7884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352591#comment-14352591
 ] 

Vinayakumar B edited comment on HDFS-7884 at 3/9/15 6:52 AM:
-------------------------------------------------------------

Earlier comment was a quick guess.

The actual reason is, reader is trying to read using the old block Id 
(blk_1073741911_1102)
But the original block's gs was modified after appending.
{noformat}2015-03-09 02:03:21,488 INFO  impl.FsDatasetImpl 
(FsDatasetImpl.java:append(1015)) - Appending to FinalizedReplica, 
blk_1073741911_1102, FINALIZED{noformat}
{noformat}2015-03-09 02:03:21,501 INFO  namenode.FSNamesystem 
(FSNamesystem.java:updatePipeline(6199)) - updatePipeline(blk_1073741911_1102, 
newGS=1110, newLength=638, newNodes=[127.0.0.1:52069, 127.0.0.1:52065, 
127.0.0.1:52074], client=DFSClient_NONMAPREDUCE_-727094507_1){noformat}

For initializing the BlockSender, while getting the replica, GS was not 
checked. {code}      synchronized(datanode.data) { 
        replica = getReplica(block, datanode);
        replicaVisibleLength = replica.getVisibleLength();
      }{code}

GS was checked against client-passed GS only for latest. i.e. If client is 
latest and DN have old, then only throw exception. Othercase it should support 
read according to code.
{code}     if (replica.getGenerationStamp() < block.getGenerationStamp()) {
        throw new IOException("Replica gen stamp < block genstamp, block="
            + block + ", replica=" + replica);
      }{code}

But while getting the Volume reference it will be checked down the line in 
ReplicaMap#get
{code}  ReplicaInfo get(String bpid, Block block) {
    checkBlockPool(bpid);
    checkBlock(block);
    ReplicaInfo replicaInfo = get(bpid, block.getBlockId());
    if (replicaInfo != null && 
        block.getGenerationStamp() == replicaInfo.getGenerationStamp()) {
      return replicaInfo;
    }
    return null;
  }{code}

So I think, In this case, If client-read needs to go through, then need to bump 
up the genstamp to latest, like below.
{code}      if (replica.getGenerationStamp() < block.getGenerationStamp()) {
        throw new IOException("Replica gen stamp < block genstamp, block="
            + block + ", replica=" + replica);
      } else if (replica.getGenerationStamp() > block.getGenerationStamp()) {
        DataNode.LOG.debug("Bumping up the client provided"
            + " block's genstamp to latest " + replica.getGenerationStamp()
            + " for block " + block);
        block.setGenerationStamp(replica.getGenerationStamp());
      }{code}

Else needs to throw exception from here itself.

Any thoughts.?


was (Author: vinayrpet):
I think, its just a race between client-read and delete of the block.
Safe option is to null-check and throw ReplicaNotFoundException
{code}      // Obtain a reference before reading data
      FsVolumeSpi volume = datanode.data.getVolume(block);
      if (volume == null) {
        // This is race b/n delete and read
        throw new ReplicaNotFoundException(block);
      }
      this.volumeRef = volume.obtainReference();{code}

> NullPointerException in BlockSender
> -----------------------------------
>
>                 Key: HDFS-7884
>                 URL: https://issues.apache.org/jira/browse/HDFS-7884
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Brahma Reddy Battula
>            Priority: Blocker
>         Attachments: 
> org.apache.hadoop.hdfs.TestAppendSnapshotTruncate-output.txt
>
>
> {noformat}
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:264)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
>       at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:249)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> BlockSender.java:264 is shown below
> {code}
>       this.volumeRef = datanode.data.getVolume(block).obtainReference();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to