Daryn Sharp created HDFS-13465: ---------------------------------- Summary: Overlapping lease recoveries cause NPE in NN Key: HDFS-13465 URL: https://issues.apache.org/jira/browse/HDFS-13465 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.8.0 Reporter: Daryn Sharp
Overlapping lease recoveries for the same file will NPE in the DatanodeManager while creating LeaseRecoveryCommands, possibly losing other recovery commands. * client1 calls recoverLease, file is added to DN1's recovery queue * client2 calls recoverLease, file is added to DN2's recovery queue * one DN heartbeats, gets the block recovery command and it completes the synchronization before the other DN heartbeats; ie. file is closed. * other DN heartbeats, takes block from recovery queue, assumes it's still under construction, gets a NPE calling getExpectedLocations {code:java} //check lease recovery BlockInfo[] blocks = nodeinfo.getLeaseRecoveryCommand(Integer.MAX_VALUE); if (blocks != null) { BlockRecoveryCommand brCommand = new BlockRecoveryCommand( blocks.length); for (BlockInfo b : blocks) { BlockUnderConstructionFeature uc = b.getUnderConstructionFeature(); assert uc != null; final DatanodeStorageInfo[] storages = uc.getExpectedStorageLocations(); {code} This is "ok" to the NN state if only 1 block was queued. All recoveries are lost if multiple blocks were queued. Recovery will not occur until the client explicitly retries or the lease monitor recovers the lease. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org