Re: data loss after cluster wide power loss

Dave Latham Mon, 01 Jul 2013 15:36:26 -0700

Thanks for the response, Suresh.

I'm not sure that I understand the details properly.  From my reading of
HDFS-744 the hsync API would allow a client to make sure that at any point
in time it's writes so far hit the disk.  For example, for HBase it could
apply a fsync after adding some edits to its WAL to ensure those edits are
fully durable for a file which is still open.


However, in this case the dfs file was closed and even renamed.  Is it the
case that even after a dfs file is closed and renamed that the data blocks
would still not be synced and would still be stored by the datanode in
"blocksBeingWritten" rather than in "current"?  If that is case, would it
be better for the NameNode not to reject replicas that are in
blocksBeingWritten, especially if it doesn't have any other replicas
available?

Dave


On Mon, Jul 1, 2013 at 3:16 PM, Suresh Srinivas <sur...@hortonworks.com>wrote:

> Yes this is a known issue.
>
> The HDFS part of this was addressed in
> https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not
> available in 1.x  release. I think HBase does not use this API yet.
>
>
> On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham <lat...@davelink.net> wrote:
>
> > We're running HBase over HDFS 1.0.2 on about 1000 nodes.  On Saturday the
> > data center we were in had a total power failure and the cluster went
> down
> > hard.  When we brought it back up, HDFS reported 4 files as CORRUPT.  We
> > recovered the data in question from our secondary datacenter, but I'm
> > trying to understand what happened and whether this is a bug in HDFS that
> > should be fixed.
> >
> > From what I can tell the file was created and closed by the dfs client
> > (hbase).  Then HBase renamed it into a new directory and deleted some
> other
> > files containing the same data.  Then the cluster lost power.  After the
> > cluster was restarted, the datanodes reported into the namenode but the
> > blocks for this file appeared as "blocks being written" - the namenode
> > rejected them and the datanodes deleted the blocks.  At this point there
> > were no replicas for the blocks and the files were marked CORRUPT.  The
> > underlying file systems are ext3.  Some questions that I would love get
> > answers for if anyone with deeper understanding of HDFS can chime in:
> >
> >  - Is this a known scenario where data loss is expected?  (I found
> > HDFS-1539 but that seems different)
> >  - When are blocks moved from blocksBeingWritten to current?  Does that
> > happen before a file close operation is acknowledged to a hdfs client?
> >  - Could it be that the DataNodes actually moved the blocks to current
> but
> > after the restart ext3 rewound state somehow (forgive my ignorance of
> > underlying file system behavior)?
> >  - Is there any other explanation for how this can happen?
> >
> > Here is a sequence of selected relevant log lines from the RS (HBase
> > Region Server) NN (NameNode) and DN (DataNode - 1 example of 3 in
> > question).  It includes everything that mentions the block in question in
> > the NameNode and one DataNode log.  Please let me know if this more
> > information that would be helpful.
> >
> > RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> > Creating
> >
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> > with permission=rwxrwxrwx
> > NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.allocateBlock:
> >
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
> > blk_1395839728632046111_357084589
> > DN 2013-06-29 11:16:06,832 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> > blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: /
> > 10.0.5.237:50010
> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to
> > blk_1395839728632046111_357084589 size 25418340
> > NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to
> > blk_1395839728632046111_357084589 size 25418340
> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added
> to
> > blk_1395839728632046111_357084589 size 25418340
> > DN 2013-06-29 11:16:11,385 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> > blk_1395839728632046111_357084589 of size 25418340 from /
> 10.0.5.237:14327
> > DN 2013-06-29 11:16:11,385 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> > block blk_1395839728632046111_357084589 terminating
> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange:
> > Removing lease on  file
> >
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> > from client DFSClient_hb_rs_hs745,60020,1372470111932
> > NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> > NameSystem.completeFile: file
> >
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> > is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> > RS 2013-06-29 11:16:11,393 INFO
> > org.apache.hadoop.hbase.regionserver.Store: Renaming compacted file at
> >
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> > to
> >
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> > RS 2013-06-29 11:16:11,505 INFO
> > org.apache.hadoop.hbase.regionserver.Store: Completed major compaction
> of 7
> > file(s) in n of
> > users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e.
> into
> > 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is
> 24.2m
> >
> > -------  CRASH, RESTART ---------
> >
> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: addStoredBlock request received for
> > blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but
> was
> > rejected: Reported as block being written but is a block of closed file.
> > NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> invalidSet
> > of 10.0.6.1:50010
> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: addStoredBlock request received for
> > blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 but
> > was rejected: Reported as block being written but is a block of closed
> file.
> > NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> invalidSet
> > of 10.0.5.237:50010
> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addStoredBlock: addStoredBlock request received for
> > blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 but
> > was rejected: Reported as block being written but is a block of closed
> file.
> > NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK*
> > NameSystem.addToInvalidates: blk_1395839728632046111 is added to
> invalidSet
> > of 10.0.6.24:50010
> > (note the clock on the server running DN is wrong after restart.  I
> > believe timestamps are off by 6 hours:)
> > DN 2013-06-29 06:07:22,877 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
> > blk_1395839728632046111_357084589 file
> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for
> > deletion
> > DN 2013-06-29 06:07:24,952 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
> > blk_1395839728632046111_357084589 at file
> > /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111
> >
> >
> > Thanks,
> > Dave
> >
>
>
>
> --
> http://hortonworks.com/download/
>

Re: data loss after cluster wide power loss

Reply via email to