You want to implement a RAID on top of HDFS or use HDFS on top of RAID? I
am not sure I understand any of these use cases. HDFS handles for you
replication and error detection. Fine tuning the cluster wouldn't be the
easier solution?
Bertrand Dechoux
On Mon, Jul 21, 2014 at 7:25 AM, Zesheng Wu
We want to implement a RAID on top of HDFS, something like facebook
implemented as described in:
https://code.facebook.com/posts/536638663113101/saving-capacity-with-hdfs-raid/
2014-07-21 17:19 GMT+08:00 Bertrand Dechoux decho...@gmail.com:
You want to implement a RAID on top of HDFS or use
So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a file) then HDFS will replicate the good remaining
Thanks Bertrand, my reply comments inline following.
So you know that a block is corrupted thanks to an external process which
in this case is checking the parity blocks. If a block is corrupted but
hasn't been detected by HDFS, you could delete the block from the local
filesystem (it's only a
If a block is corrupted but hasn't been detected by HDFS, you could delete
the block from the local filesystem (it's only a file) then HDFS will
replicate the good remaining replica of this block.
We only have one replica for each block, if a block is corrupted, HDFS
cannot replicate it.
I wrote my answer thinking about the XOR implementation. With reed-solomon
and single replication, the cases that need to be considered are indeed
smaller, simpler.
It seems I was wrong about my last statement though. If the machine hosting
a single-replicated block is lost, it isn't likely that
And there is actually quite a lot of information about it.
https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
http://wiki.apache.org/hadoop/HDFS-RAID
Thank Bertrand, I've checked these information earlier. There's only XOR
implementation, and missed blocks are reconstructed by creating new files.
2014-07-22 3:47 GMT+08:00 Bertrand Dechoux decho...@gmail.com:
And there is actually quite a lot of information about it.
Mmm, it seems that the facebook branch
https://github.com/facebook/hadoop-20/
https://github.com/facebook/hadoop-20/blob/master/src/contrib/raid/src/java/org/apache/hadoop/hdfs/DistributedRaidFileSystem.java
has
implemented reed-solomon codes, what I was checking earlier were the
following two
Thanks for reply, Arpit.
Yes, we need to do this regularly. The original requirement of this is that
we want to do RAID(which is based reed-solomon erasure codes) on our HDFS
cluster. When a block is corrupted or missing, the downgrade read needs
quick recovery of the block. We are considering how
That will break the consistency of the file system, but it doesn't hurt to
try.
On Jul 17, 2014 8:48 PM, Zesheng Wu wuzeshen...@gmail.com wrote:
How about write a new block with new checksum file, and replace the old
block file and checksum file both?
2014-07-17 19:34 GMT+08:00 Wellington
IMHO this is a spectacularly bad idea. Is it a one off event? Why not just
take the perf hit and recreate the file?
If you need to do this regularly you should consider a mutable file store
like HBase. If you start modifying blocks from under HDFS you open up all
sorts of consistency issues.
Hi guys,
I recently encounter a scenario which needs to replace an exist block with
a newly written block
The most straightforward way to finish may be like this:
Suppose the original file is A, and we write a new file B which is composed
by the new data blocks, then we merge A and B to C which
Hi,
there's no way to do that, as HDFS does not provide file updates features.
You'll need to write a new file with the changes.
Notice that even if you manage to find the physical block replica files on the
disk, corresponding to the part of the file you want to change, you can't
simply
How about write a new block with new checksum file, and replace the old
block file and checksum file both?
2014-07-17 19:34 GMT+08:00 Wellington Chevreuil
wellington.chevre...@gmail.com:
Hi,
there's no way to do that, as HDFS does not provide file updates features.
You'll need to write a
15 matches
Mail list logo