[ https://issues.apache.org/jira/browse/HDFS-11413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865333#comment-15865333 ]
Weiwei Yang commented on HDFS-11413: ------------------------------------ Hi [~NishantVerma] Can you post this to user mailing list or stack overflow instead? JIRA is used to track bugs/dev-tasks, not user issues. > HDFS fsck command shows health as corrupt for '/' > ------------------------------------------------- > > Key: HDFS-11413 > URL: https://issues.apache.org/jira/browse/HDFS-11413 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Nishant Verma > > I have open source hadoop version 2.7.3 cluster (2 Masters + 3 Slaves) > installed on AWS EC2 instances. I am using the cluster to integrate it with > Kafka Connect. > The setup of cluster was done last month and setup of kafka connect was > completed last fortnight. Since then, we were able to operate the kafka topic > records on our HDFS and do various operations. > Since last afternoon, I find that any kafka topic is not getting committed to > the cluster. When I tried to open the older files, I started getting below > error. When I copy a new file to the cluster from local, it comes and gets > opened but after some time, again starts showing similar IOException: > ========================================================== > 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 > IOException, will wait for 499.3472970548959 msec. > 17/02/14 07:57:55 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:57:55 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:57:55 WARN hdfs.DFSClient: DFS chooseDataNode: got # 2 > IOException, will wait for 4988.873277172643 msec. > 17/02/14 07:58:00 INFO hdfs.DFSClient: No node available for > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > 17/02/14 07:58:00 INFO hdfs.DFSClient: Could not obtain > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 from any node: > java.io.IOException: No live nodes contain block > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 after checking > nodes = [], ignoredNodes = null No live nodes contain current block Block > locations: Dead nodes: . Will get new block locations from namenode and > retry... > 17/02/14 07:58:00 WARN hdfs.DFSClient: DFS chooseDataNode: got # 3 > IOException, will wait for 8598.311122824263 msec. > 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log No live nodes contain current block Block > locations: Dead nodes: . Throwing a BlockMissingException > 17/02/14 07:58:09 WARN hdfs.DFSClient: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log No live nodes contain current block Block > locations: Dead nodes: . Throwing a BlockMissingException > 17/02/14 07:58:09 WARN hdfs.DFSClient: DFS Read > org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:983) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:642) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119) > at > org.apache.hadoop.fs.shell.Display$Cat.printToStdout(Display.java:107) > at > org.apache.hadoop.fs.shell.Display$Cat.processPath(Display.java:102) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) > at > org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) > at > org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) > at > org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:201) > at org.apache.hadoop.fs.shell.Command.run(Command.java:165) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:287) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:340) > cat: Could not obtain block: > BP-1831277630-10.16.37.124-1484306078618:blk_1073793876_55013 > file=/test/inputdata/derby.log > ========================================================== > When I do : hdfs fsck / , I get: > ========================================================== > Total size: 667782677 B > Total dirs: 406 > Total files: 44485 > Total symlinks: 0 > Total blocks (validated): 43767 (avg. block size 15257 B) > ******************************** > UNDER MIN REPL'D BLOCKS: 43766 (99.99772 %) > dfs.namenode.replication.min: 1 > CORRUPT FILES: 43766 > MISSING BLOCKS: 43766 > MISSING SIZE: 667781648 B > CORRUPT BLOCKS: 43766 > ******************************** > Minimally replicated blocks: 1 (0.0022848265 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 3 > Average block replication: 6.8544796E-5 > Corrupt blocks: 43766 > Missing replicas: 0 (0.0 %) > Number of data-nodes: 3 > Number of racks: 1 > FSCK ended at Tue Feb 14 07:59:10 UTC 2017 in 932 milliseconds > The filesystem under path '/' is CORRUPT > ========================================================== > That means, all my files got corrupted somehow. > I want to recover my HDFS and fix the corrupt health status. Also, I would > like to understand, how such an issue occurred suddenly and how to prevent it > in future? > Many thanks, > Nishant Verma -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org