Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
As others have answered, the number of blocks/files/directories that can be addressed by a NameNode is limited by the amount of heap space available to the NameNode JVM. If you need more background on this topic, I'd suggest reviewing various materials from Hadoop JIRA and other vendors that suppl

Re: HDFS2 vs MaprFS

2016-06-06 Thread Ascot Moss
Hi Aaron, from MapR site, [now HDSF2] "Limit to 50-200 million files", is it really true? On Tue, Jun 7, 2016 at 12:09 AM, Aaron Eng wrote: > As I said, MapRFS has topologies. You assign a volume (which is mounted > at a directory path) to a topology and in turn all the data for the volume > (e

Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
As I said, MapRFS has topologies. You assign a volume (which is mounted at a directory path) to a topology and in turn all the data for the volume (e.g. under the directory) is stored on the storage hardware assigned to the topology. These topological labels provide the same benefits as dfs.stora

Re: HDFS2 vs MaprFS

2016-06-06 Thread Ascot Moss
In HDFS2, I can find "dfs.storage.policy", for instances, HDFS2 allows to *Apply the COLD storage policy to a directory,* where are these features in Mapr-FS? On Mon, Jun 6, 2016 at 11:43 PM, Aaron Eng wrote: > >Since MapR is proprietary, I find that it has many compatibility issues > in Apac

Re: HDFS2 vs MaprFS

2016-06-06 Thread Aaron Eng
>Since MapR is proprietary, I find that it has many compatibility issues in Apache open source projects This is faulty logic. And rather than saying it has "many compatibility issues", perhaps you can describe one. Both MapRFS and HDFS are accessible through the same API. The backend implementa

Re: HDFS2 vs MaprFS

2016-06-06 Thread Ascot Moss
Since MapR is proprietary, I find that it has many compatibility issues in Apache open source projects, or even worse, lose Hadoop's features. For instances, Hadoop has a built-in storage policy named COLD, where is it in Mapr-FS? no to mention that Mapr-FS loses Data-Locality. On Mon, Jun 6, 2

Re: HDFS2 vs MaprFS

2016-06-06 Thread Ascot Moss
I don't think HDFS2 needs SAN, use the QuorumJournal approach is much better than using Shared edits directory SAN approach. On Monday, June 6, 2016, Peyman Mohajerian wrote: > It is very common practice to backup the metadata in some SAN store. So > the idea of complete loss of all the metada

Re: HDFS2 vs MaprFS

2016-06-05 Thread Peyman Mohajerian
It is very common practice to backup the metadata in some SAN store. So the idea of complete loss of all the metadata is preventable. You could lose a day worth of data if e.g. you back the metadata once a day but you could do it more frequently. I'm not saying S3 or Azure Blob are bad ideas. On S

Re: HDFS2 vs MaprFS

2016-06-05 Thread Marcin Tustin
The namenode architecture is a source of fragility in HDFS. While a high availability deployment (with two namenodes, and a failover mechanism) means you're unlikely to see service interruption, it is still possible to have a complete loss of filesystem metadata with the loss of two machines. Seco

Re: HDFS2 vs MaprFS

2016-06-05 Thread Hayati Gonultas
Another correction about the terminology needs to be made. i said 1gb = 1 million blocks. Pay attention to term block. it is not file. A file may contain more than one block. Default block size 64mb so 640 mb file will hold 10 blocks. Each file has its name ,permissions, path, creation date and et

Re: HDFS2 vs MaprFS

2016-06-05 Thread Hayati Gonultas
it is written 128 000 000 million in my previous post. it was incorrect (million million) what i mean is 128 million. 1gb raughly 1 million. 5 Haz 2016 16:58 tarihinde "Ascot Moss" yazdı: > HDFS2 "Limit to 50-200 million files", is it really true like what MapR > says? > > On Sun, Jun 5, 2016 a

Re: HDFS2 vs MaprFS

2016-06-05 Thread Hayati Gonultas
No it is Not true. it totally depends of server's Ram. Assume that each file holds 1k on Ram and your server has 128gb of ram. So you will have 128 000 000 million file. But 1k is just approximation. Raughtly 1gb holds 1million blocks. So if your server has 512gb of ram then you can approximately

Re: HDFS2 vs MaprFS

2016-06-05 Thread Ascot Moss
HDFS2 "Limit to 50-200 million files", is it really true like what MapR says? On Sun, Jun 5, 2016 at 7:55 PM, Hayati Gonultas wrote: > I forgot to mention about file system limit. > > Yes HDFS has limit, because for the performance considirations HDFS > filesystem is read from disk to RAM and re

Re: HDFS2 vs MaprFS

2016-06-05 Thread Hayati Gonultas
I forgot to mention about file system limit. Yes HDFS has limit, because for the performance considirations HDFS filesystem is read from disk to RAM and rest of the work is done with RAM. So RAM should be big enough to fit the filesystem image. But HDFS has configuration options like har files (Ha

Re: HDFS2 vs MaprFS

2016-06-05 Thread Hayati Gonultas
Hi, In most cases I think one cluster is enough. Since HDFS is a file system, and with federation you may have multiple namenodes for different mount points. So, you may mount /images/facebook to a namenode1 and /images/instagram to namenode2, similar to linux file system mounts. With such a way y

Re: HDFS2 vs MaprFS

2016-06-05 Thread Ascot Moss
Will the the common pool of datanodes and namenode federation be a more effective alternative in HDFS2 than multiple clusters? On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle wrote: > There are indeed many tuning points here. If the name nodes and journal > nodes can be larger, perhaps even

Re: HDFS2 vs MaprFS

2016-06-04 Thread daemeon reiydelle
There are indeed many tuning points here. If the name nodes and journal nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can easily scale. I did have one client where the file counts forced multiple clusters. But we were able to differentiate by airframe types ... eg fixed wing

Re: HDFS2 vs MaprFS

2016-06-04 Thread Gavin Yue
Here is what I found on Horton website. Namespace scalability While HDFS cluster storage scales horizontally with the addition of datanodes, the namespace does not. Currently the namespace can only be vertically scaled on a single namenode. The namenode stores the entire file system metadat

HDFS2 vs MaprFS

2016-06-04 Thread Ascot Moss
Hi, I read some (old?) articles from Internet about Mapr-FS vs HDFS. https://www.mapr.com/products/m5-features/no-namenode-architecture It states that HDFS Federation has a) "Multiple Single Points of Failure", is it really true? Why MapR uses HDFS but not HDFS2 in its comparison as this would