Re: [DISCUSS] Separating out the metastore as its own TLP

Harsha Fri, 30 Jun 2017 13:25:04 -0700

Thanks for the proposal Alan. I am +1 on separating the Hive Metastore.
This is a great opportunity for building a Metastore to not only address
schemas for the data at rest but also for the data in motion. We have a
SchemaRegistry (http://github.com/hortonworks/registry)  project that
allows users to register schemas for data in motion and integrates with
Kafka, Kinesis, Evenhubs and other messaging queues. This will provide
us with opportunity to integrate our apis with Hive Metastore and
provide with one project that is truly a single metastore that can hold
all schemas.


Thanks,
Harsha

On Fri, Jun 30, 2017, at 01:18 PM, Sergio Pena wrote:
> Great, thanks Alan for putting all this in the email.
> +1
> 
> Allowing other components to continue to use the Metastore without the
> need
> to use Hive dependencies is a big plus for them. I agree with everything
> you mention on the email.
> 
> - Sergio
> 
> On Fri, Jun 30, 2017 at 1:49 PM, Julian Hyde <jh...@apache.org> wrote:
> 
> > +1
> >
> > As a Calcite PMC member, I am very pleased to see this change. Calcite
> > reads metadata from a variety of sources (including JDBC databases, NoSQL
> > databases such as Cassandra and Druid, and streaming systems), and if more
> > of those sources choose to store their metadata in the metastore it will
> > make our lives easier.
> >
> > Hive’s metastore has established a position as the place to go for
> > metadata in the Hadoop ecosystem. Not all metadata is relational, or
> > processed by Hive, so there are other parties using the metastore who
> > justifiably would like to influence its direction. Opening up the metastore
> > will help retain and extend this position.
> >
> > Julian
> >
> >
> > On 2017-06-30 10:00 (-0700), "Dimitris ts...@apache.org> wrote:
> > >
> > >
> > > On 2017-06-30 07:56 (-0700), Alan Gates <al...@gmail.com> wrote: >
> > > > A few of us have been talking and come to the conclussion that it
> > would be>
> > > > a good thing to split out the Hive metastore into its own Apache
> > project.>
> > > > Below and in the linked wiki page we explain what we see as the
> > advantages>
> > > > to this and how we would go about it.>
> > > > >
> > > > Hive’s metastore has long been used by other projects in the Hadoop>
> > > > ecosystem to store and access metadata.  Apache Impala, Apache Spark,>
> > > > Apache Drill, Presto, and other systems all use Hive’s metastore.
> > Some,>
> > > > like Impala and Presto can use it as their own metadata system with
> > the>
> > > > rest of Hive not present.>
> > > > >
> > > > This sharing is excellent for the ecosystem.  Together with HDFS it
> > allows>
> > > > users to use the tool of their choice while still accessing the same
> > shared>
> > > > data.  But having this shared metadata inside the Hive project limits
> > the>
> > > > ability of other projects to contribute to the metastore.  It also
> > makes it>
> > > > harder for new systems that have similar but not identical metadata>
> > > > requirements (for example, stream processing systems on top of Apache>
> > > > Kafka) to use Hive’s metastore.  This difficulty for other systems
> > comes>
> > > > out in two ways.  One, it is hard for non-Hive community members to>
> > > > participate in the project.  Second, it adds operational cost since
> > users>
> > > > are forced to deploy all of the Hive jars just to get the metastore to
> > work.>
> > > > >
> > > > Therefore we propose to split Hive’s metastore out into a separate
> > Apache>
> > > > project.  This new project will continue to support the same Thrift
> > API as>
> > > > the current metastore.  It will continue to focus on being a high>
> > > > performance, fault tolerant, large scale, operational metastore for
> > SQL>
> > > > engines and other systems that want to store schema information about
> > their>
> > > > data.>
> > > > >
> > > > By making it a separate project we will enable other projects to join
> > us in>
> > > > innovating on the metastore.  It will simplify operations for non-Hive>
> > > > users that want to use the metastore as they will no longer need to
> > install>
> > > > Hive just to get the metastore.  And it will attract new projects that>
> > > > might otherwise feel the need to solve their metadata problems on
> > their own.>
> > > > >
> > > > Any Hive PMC member or committer will be welcome to join the new
> > project at>
> > > > the same level.  We propose this project go straight to a top level>
> > > > project.  Given that the initial PMC will be formed from experienced
> > Hive>
> > > > PMC members we do not believe incubation will be necessary.  (Note
> > that the>
> > > > Apache board will need to approve this.)>
> > > > >
> > > > Obviously there a many details involved in a proposal like this.
> > Rather>
> > > > than make this a ten page email we have filled out many of the details
> > in a>
> > > > wiki page:>
> > > > https://cwiki.apache.org/confluence/display/Hive/
> > Metastore+TLP+Proposal>
> > > > >
> > > > Yongzhi Chen>
> > > > Vihang Karajgaonkar>
> > > > Sergio Pena>
> > > > Sahil Takiar>
> > > > Aihua Xu>
> > > > Gunther Hagleitner>
> > > > Thejas Nair>
> > > > Alan Gates>
> > > > >
> > >
> > > +1 (from Apache Impala's (incubating) perspective)>
> > >
> > > Dimitris>
> > >
> >


Thanks,
Harsha

Re: [DISCUSS] Separating out the metastore as its own TLP

Reply via email to