[ https://issues.apache.org/jira/browse/HDDS-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059327#comment-17059327 ]
Yiqun Lin edited comment on HDDS-3180 at 3/14/20, 12:29 PM: ------------------------------------------------------------ We need to additionally add log for the inconsistent state because this state will lead Datanode failed to start. A more friendly message tested in my local: {noformat} 2020-03-14 04:41:27,249 [main] INFO (HddsVolume.java:177) - Creating Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : 9997713408 2020-03-14 04:41:27,250 [main] WARN (HddsVolume.java:252) - VERSION file does not exist in volume /tmp/hadoop-hdfs/dfs/data/hdds, current volume state: INCONSISTENT. 2020-03-14 04:41:27,257 [main] ERROR (MutableVolumeSet.java:202) - Failed to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading volume: /tmp/hadoop-hdfs/dfs/data/hdds at org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:180) at org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:71) at org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) at org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) {noformat} was (Author: linyiqun): We need to additionally add log for the inconsistent state because this state will lead Datanode failed to start. > Datanode fails to start due to confused inconsistent volume state > ----------------------------------------------------------------- > > Key: HDDS-3180 > URL: https://issues.apache.org/jira/browse/HDDS-3180 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Affects Versions: 0.4.1 > Reporter: Yiqun Lin > Assignee: Yiqun Lin > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I meet an error in my testing ozone cluster when I restart datanode. From the > log, it throws inconsistent volume state but without other detailed helpful > info: > {noformat} > 2020-03-14 02:31:46,204 [main] INFO (LogAdapter.java:51) - registered > UNIX signal handlers for [TERM, HUP, INT] > 2020-03-14 02:31:46,736 [main] INFO (HddsDatanodeService.java:204) - > HddsDatanodeService host:lyq-xx.xx.xx.xx ip:xx.xx.xx.xx > 2020-03-14 02:31:46,784 [main] INFO (HddsVolume.java:177) - Creating > Volume: /tmp/hadoop-hdfs/dfs/data/hdds of storage type : DISK and capacity : > 20063645696 > 2020-03-14 02:31:46,786 [main] ERROR (MutableVolumeSet.java:202) - Failed > to parse the storage location: file:///tmp/hadoop-hdfs/dfs/data > java.io.IOException: Volume is in an INCONSISTENT state. Skipped loading > volume: /tmp/hadoop-hdfs/dfs/data/hdds > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.initialize(HddsVolume.java:226) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:180) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume.<init>(HddsVolume.java:71) > at > org.apache.hadoop.ozone.container.common.volume.HddsVolume$Builder.build(HddsVolume.java:158) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.createVolume(MutableVolumeSet.java:336) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.initializeVolumeSet(MutableVolumeSet.java:183) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:139) > at > org.apache.hadoop.ozone.container.common.volume.MutableVolumeSet.<init>(MutableVolumeSet.java:111) > at > org.apache.hadoop.ozone.container.ozoneimpl.OzoneContainer.<init>(OzoneContainer.java:97) > at > org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine.<init>(DatanodeStateMachine.java:128) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:235) > at > org.apache.hadoop.ozone.HddsDatanodeService.start(HddsDatanodeService.java:179) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:154) > at > org.apache.hadoop.ozone.HddsDatanodeService.call(HddsDatanodeService.java:78) > at picocli.CommandLine.execute(CommandLine.java:1173) > at picocli.CommandLine.access$800(CommandLine.java:141) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1367) > at picocli.CommandLine$RunLast.handle(CommandLine.java:1335) > at > picocli.CommandLine$AbstractParseResultHandler.handleParseResult(CommandLine.java:1243) > at picocli.CommandLine.parseWithHandlers(CommandLine.java:1526) > at picocli.CommandLine.parseWithHandler(CommandLine.java:1465) > at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:65) > at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:56) > at > org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.java:137) > 2020-03-14 02:31:46,795 [shutdown-hook-0] INFO (LogAdapter.java:51) - > SHUTDOWN_MSG: > {noformat} > Then I look into the code and the root cause is that the version file was > lost in that node. > We need to log key message as well to help user quickly know the root cause > of this. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org