Hi folks! Thank you for sharing the design docs and the tremendous amount of work that has gone into Ozone. I'm grateful that atleast someone is trying to drastically improve HDFS.
*If* there is a meeting to discuss this merge, could I please also be invited? Have we ever thought about distributing the Namenode metadata across nodes dynamically based on load and RPC times (unlike static federation that we have now)? Also, I think a major feature that HDFS still lacks (and a lot of our users ask for) is BCP / Disaster Recovery. I only bring this up to see if the choice of proposed design would have implications for that later on..... Thanks, Ravi On Fri, Nov 3, 2017 at 1:56 PM, sanjay Radia <sanjayo...@gmail.com> wrote: > Konstantine, > Thanks for your comments, questions and feedback. I have attached a > document to the HDFS-7240 jira > that explains a design for scaling HDFS and how Ozone paves the way > towards the full solution. > > > https://issues.apache.org/jira/secure/attachment/ > 12895963/HDFS%20Scalability%20and%20Ozone.pdf > > > sanjay > > > > > > On Oct 28, 2017, at 2:00 PM, Konstantin Shvachko <shv.had...@gmail.com> > wrote: > > > > Hey guys, > > > > It is an interesting question whether Ozone should be a part of Hadoop. > > There are two main reasons why I think it should not. > > > > 1. With close to 500 sub-tasks, with 6 MB of code changes, and with a > > sizable community behind, it looks to me like a whole new project. > > It is essentially a new storage system, with different (than HDFS) > > architecture, separate S3-like APIs. This is really great - the World > sure > > needs more distributed file systems. But it is not clear why Ozone should > > co-exist with HDFS under the same roof. > > > > 2. Ozone is probably just the first step in rebuilding HDFS under a new > > architecture. With the next steps presumably being HDFS-10419 and > > HDFS-11118. > > The design doc for the new architecture has never been published. I can > > only assume based on some presentations and personal communications that > > the idea is to use Ozone as a block storage, and re-implement NameNode, > so > > that it stores only a partial namesapce in memory, while the bulk of it > > (cold data) is persisted to a local storage. > > Such architecture makes me wonder if it solves Hadoop's main problems. > > There are two main limitations in HDFS: > > a. The throughput of Namespace operations. Which is limited by the > number > > of RPCs the NameNode can handle > > b. The number of objects (files + blocks) the system can maintain. Which > > is limited by the memory size of the NameNode. > > The RPC performance (a) is more important for Hadoop scalability than the > > object count (b). The read RPCs being the main priority. > > The new architecture targets the object count problem, but in the expense > > of the RPC throughput. Which seems to be a wrong resolution of the > tradeoff. > > Also based on the use patterns on our large clusters we read up to 90% of > > the data we write, so cold data is a small fraction and most of it must > be > > cached. > > > > To summarize: > > - Ozone is a big enough system to deserve its own project. > > - The architecture that Ozone leads to does not seem to solve the > intrinsic > > problems of current HDFS. > > > > I will post my opinion in the Ozone jira. Should be more convenient to > > discuss it there for further reference. > > > > Thanks, > > --Konstantin > > > > > > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> > wrote: > > > >> Hello everyone, > >> > >> > >> I would like to start this thread to discuss merging Ozone (HDFS-7240) > to > >> trunk. This feature implements an object store which can co-exist with > >> HDFS. Ozone is disabled by default. We have tested Ozone with cluster > sizes > >> varying from 1 to 100 data nodes. > >> > >> > >> > >> The merge payload includes the following: > >> > >> 1. All services, management scripts > >> 2. Object store APIs, exposed via both REST and RPC > >> 3. Master service UIs, command line interfaces > >> 4. Pluggable pipeline Integration > >> 5. Ozone File System (Hadoop compatible file system implementation, > >> passes all FileSystem contract tests) > >> 6. Corona - a load generator for Ozone. > >> 7. Essential documentation added to Hadoop site. > >> 8. Version specific Ozone Documentation, accessible via service UI. > >> 9. Docker support for ozone, which enables faster development cycles. > >> > >> > >> To build Ozone and run ozone using docker, please follow instructions in > >> this wiki page. https://cwiki.apache.org/confl > >> uence/display/HADOOP/Dev+cluster+with+docker. > >> > >> > >> We have built a passionate and diverse community to drive this feature > >> development. As a team, we have achieved significant progress in past 3 > >> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we > >> have resolved almost 400 JIRAs by 20+ contributors/committers from > >> different countries and affiliations. We also want to thank the large > >> number of community members who were supportive of our efforts and > >> contributed ideas and participated in the design of ozone. > >> > >> > >> Please share your thoughts, thanks! > >> > >> > >> -- Weiwei Yang > >> > > > > > > On Wed, Oct 18, 2017 at 6:54 PM, Yang Weiwei <cheersy...@hotmail.com> > wrote: > > > >> Hello everyone, > >> > >> > >> I would like to start this thread to discuss merging Ozone (HDFS-7240) > to > >> trunk. This feature implements an object store which can co-exist with > >> HDFS. Ozone is disabled by default. We have tested Ozone with cluster > sizes > >> varying from 1 to 100 data nodes. > >> > >> > >> > >> The merge payload includes the following: > >> > >> 1. All services, management scripts > >> 2. Object store APIs, exposed via both REST and RPC > >> 3. Master service UIs, command line interfaces > >> 4. Pluggable pipeline Integration > >> 5. Ozone File System (Hadoop compatible file system implementation, > >> passes all FileSystem contract tests) > >> 6. Corona - a load generator for Ozone. > >> 7. Essential documentation added to Hadoop site. > >> 8. Version specific Ozone Documentation, accessible via service UI. > >> 9. Docker support for ozone, which enables faster development cycles. > >> > >> > >> To build Ozone and run ozone using docker, please follow instructions in > >> this wiki page. https://cwiki.apache.org/confluence/display/HADOOP/Dev+ > >> cluster+with+docker. > >> > >> > >> We have built a passionate and diverse community to drive this feature > >> development. As a team, we have achieved significant progress in past 3 > >> years since first JIRA for HDFS-7240 was opened on Oct 2014. So far, we > >> have resolved almost 400 JIRAs by 20+ contributors/committers from > >> different countries and affiliations. We also want to thank the large > >> number of community members who were supportive of our efforts and > >> contributed ideas and participated in the design of ozone. > >> > >> > >> Please share your thoughts, thanks! > >> > >> > >> -- Weiwei Yang > >> > >