Yiqun Lin created HDDS-3180: ------------------------------- Summary: Datanode shutdown due to inconsistent volume state without helpful error message Key: HDDS-3180 URL: https://issues.apache.org/jira/browse/HDDS-3180 Project: Hadoop Distributed Data Store Issue Type: Improvement Affects Versions: 0.4.1 Reporter: Yiqun Lin Assignee: Yiqun Lin
I meet an error in my testing ozone cluster when I restart datanode. From the log, it throws inconsistent volume state but without other detailed helpful info: {noformat} 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered UNIX signal handlers for [TERM, HUP, INT] 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 20063645696 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading volume: /tmp/hadoop-hdfs/dfs/data/hdds at org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:180) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:71) at org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:139) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:111) at org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:97) at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:128) at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) at org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) at org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) at picocli.CommandLine.execute(CommandLine.java:1173) at picocli.CommandLine.access$800(CommandLine.java:141) at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) at picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) at org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - SHUTDOWN_MSG: {noformat} Then I look into the code and the root cause is that the version file was lost in that node. We need to log key message as well to help user quickly know the root cause of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org