Re: HDFS2 vs MaprFS

Gavin Yue Sat, 04 Jun 2016 14:23:59 -0700

Here is what I found on Horton website.


Namespace scalability

While HDFS cluster storage scales horizontally with the addition of datanodes, 
the namespace does not. Currently the namespace can only be vertically scaled 
on a single namenode.  The namenode stores the entire file system metadata in 
memory. This limits the number of blocks, files, and directories supported on 
the file system to what can be accommodated in the memory of a single namenode. 
A typical large deployment at Yahoo! includes an HDFS cluster with 2700-4200 
datanodes with 180 million files and blocks, and address ~25 PB of storage.  At 
Facebook, HDFS has around 2600 nodes, 300 million files and blocks, addressing 
up to 60PB of storage. While these are very large systems and good enough for 
majority of Hadoop users, a few deployments that might want to grow even larger 
could find the namespace scalability limiting.





> On Jun 4, 2016, at 04:43, Ascot Moss <ascot.m...@gmail.com> wrote:
> 
> Hi,
> 
> I read some (old?) articles from Internet about Mapr-FS vs HDFS. 
> 
> https://www.mapr.com/products/m5-features/no-namenode-architecture
> 
> It states that HDFS Federation has 
> 
> a) "Multiple Single Points of Failure", is it really true?  
> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to an 
> unfair comparison (or even misleading comparison)?  (HDFS was from Hadoop 
> 1.x, the old generation) HDFS2 is available since 2013-10-15, there is no any 
> Single Points of  Failure in HDFS2.
> 
> b) "Limit to 50-200 million files", is it really true? 
> I have seen so many real world Hadoop Clusters with over 10PB data, some even 
> with 150PB data.  If "Limit to 50 -200 millions files" were true in HDFS2, 
> why are there so many production Hadoop clusters in real world? how can they 
> mange well the issue of  "Limit to 50-200 million files"? For instances,  the 
> Facebook's "Like" implementation runs on HBase at Web Scale, I can image 
> HBase generates huge number of files in Facbook's Hadoop cluster, the number 
> of files in Facebook's Hadoop cluster should be much much bigger than 50-200 
> million.
> 
> From my point of view, in contrast, MaprFS should have true limitation up to 
> 1T files while HDFS2 can handle true unlimited files, please do correct me if 
> I am wrong.
> 
> c) "Performance Bottleneck", again, is it really true?
> MaprFS does not have namenode in order to gain file system performance. If 
> without Namenode, MaprFS would lose Data Locality which is one of the 
> beauties of Hadoop  If Data Locality is no longer available, any big data 
> application running on MaprFS might gain some file system performance but it 
> would totally lose the true gain of performance from Data Locality provided 
> by Hadoop's namenode (gain small lose big)
> 
> d) "Commercial NAS required"
> Is there any wiki/blog/discussion about Commercial NAS on Hadoop Federation?
> 
> regards
>  
> 
>

Re: HDFS2 vs MaprFS

Reply via email to