Lei Yang created HDFS-16836: ------------------------------- Summary: StandbyCheckpointer can still trigger rollback fs image after RU is finalized Key: HDFS-16836 URL: https://issues.apache.org/jira/browse/HDFS-16836 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Reporter: Lei Yang
StandbyCheckpointer trigger rollback fsimage when RU is started. When ru is started, a flag (needRollbackImage) was set to true during edit log replay. And it only gets reset to false when doCheckpoint() succeeded. Think about following scenario: # Start RU, needRollbackImage is set to true. # doCheckpoint() failed. needRollbackImage was never set to false. # RU is finalized. # needRollbackImage is still true so the checkpoint period and threshold were not honored. {code:java} StandbyCheckpointer: void doWork() { .... doCheckpoint(); // reset needRollbackCheckpoint to false only when we finish a ckpt // for rollback image if (needRollbackCheckpoint && namesystem.getFSImage().hasRollbackFSImage()) { namesystem.setCreatedRollbackImages(true); namesystem.setNeedRollbackFsImage(false); } lastCheckpointTime = now; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org