Konstantine, Thanks for your comments, questions and feedback. I have attached a document to the HDFS-7240 jira that explains a design for scaling HDFS and how Ozone paves the way towards the full solution.
https://issues.apache.org/jira/secure/attachment/12895963/HDFS%20Scalability%20and%20Ozone.pdf sanjay > On Oct 28, 2017, at 2:00 PM, Konstantin Shvachko <shv.had...@gmail.com> wrote: > > Hey guys, > > It is an interesting question whether Ozone should be a part of Hadoop. > There are two main reasons why I think it should not. > > 1. With close to 500 sub-tasks, with 6 MB of code changes, and with a > sizable community behind, it looks to me like a whole new project. > It is essentially a new storage system, with different (than HDFS) > architecture, separate S3-like APIs. This is really great - the World sure > needs more distributed file systems. But it is not clear why Ozone should > co-exist with HDFS under the same roof. > > 2. Ozone is probably just the first step in rebuilding HDFS under a new > architecture. With the next steps presumably being HDFS-10419 and > HDFS-11118. > The design doc for the new architecture has never been published. I can > only assume based on some presentations and personal communications that > the idea is to use Ozone as a block storage, and re-implement NameNode, so > that it stores only a partial namesapce in memory, while the bulk of it > (cold data) is persisted to a local storage. > Such architecture makes me wonder if it solves Hadoop's main problems. > There are two main limitations in HDFS: > a. The throughput of Namespace operations. Which is limited by the number > of RPCs the NameNode can handle > b. The number of objects (files + blocks) the system can maintain. Which > is limited by the memory size of the NameNode. > The RPC performance (a) is more important for Hadoop scalability than the > object count (b). The read RPCs being the main priority. > The new architecture targets the object count problem, but in the expense > of the RPC throughput. Which seems to be a wrong resolution of the tradeoff. > Also based on the use patterns on our large clusters we read up to 90% of > the data we write, so cold data is a small fraction and most of it must be > cached. > > To summarize: > - Ozone is a big enough system to deserve its own project. > - The architecture that Ozone leads to does not seem to solve the intrinsic > problems of current HDFS. > > I will post my opinion in the Ozone jira. Should be more convenient to > discuss it there for further reference. > > Thanks, > --Konstantin > > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> wrote: > >> Hello everyone, >> >> >> I would like to start this thread to discuss merging Ozone (HDFS-7240) to >> trunk. This feature implements an object store which can co-exist with >> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes >> varying from 1 to 100 data nodes. >> >> >> >> The merge payload includes the following: >> >> 1. All services, management scripts >> 2. Object store APIs, exposed via both REST and RPC >> 3. Master service UIs, command line interfaces >> 4. Pluggable pipeline Integration >> 5. Ozone File System (Hadoop compatible file system implementation, >> passes all FileSystem contract tests) >> 6. Corona - a load generator for Ozone. >> 7. Essential documentation added to Hadoop site. >> 8. Version specific Ozone Documentation, accessible via service UI. >> 9. Docker support for ozone, which enables faster development cycles. >> >> >> To build Ozone and run ozone using docker, please follow instructions in >> this wiki page. https://cwiki.apache.org/confl >> uence/display/HADOOP/Dev+cluster+with+docker. >> >> >> We have built a passionate and diverse community to drive this feature >> development. As a team, we have achieved significant progress in past 3 >> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we >> have resolved almost 400 JIRAs by 20+ contributors/committers from >> different countries and affiliations. We also want to thank the large >> number of community members who were supportive of our efforts and >> contributed ideas and participated in the design of ozone. >> >> >> Please share your thoughts, thanks! >> >> >> -- Weiwei Yang >> > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> wrote: > >> Hello everyone, >> >> >> I would like to start this thread to discuss merging Ozone (HDFS-7240) to >> trunk. This feature implements an object store which can co-exist with >> HDFS. Ozone is disabled by default. We have tested Ozone with cluster sizes >> varying from 1 to 100 data nodes. >> >> >> >> The merge payload includes the following: >> >> 1. All services, management scripts >> 2. Object store APIs, exposed via both REST and RPC >> 3. Master service UIs, command line interfaces >> 4. Pluggable pipeline Integration >> 5. Ozone File System (Hadoop compatible file system implementation, >> passes all FileSystem contract tests) >> 6. Corona - a load generator for Ozone. >> 7. Essential documentation added to Hadoop site. >> 8. Version specific Ozone Documentation, accessible via service UI. >> 9. Docker support for ozone, which enables faster development cycles. >> >> >> To build Ozone and run ozone using docker, please follow instructions in >> this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+ >> cluster+with+docker. >> >> >> We have built a passionate and diverse community to drive this feature >> development. As a team, we have achieved significant progress in past 3 >> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we >> have resolved almost 400 JIRAs by 20+ contributors/committers from >> different countries and affiliations. We also want to thank the large >> number of community members who were supportive of our efforts and >> contributed ideas and participated in the design of ozone. >> >> >> Please share your thoughts, thanks! >> >> >> -- Weiwei Yang >>