Re: when to use hive vs hbase

Shushant Arora Wed, 30 Apr 2014 05:14:29 -0700

Hi Jean

Thanks for explanation .


I still  have one doubt
Why HBase is not good for bulk loads and aggregations
(Full table scan) ? Hive will also read each row for aggregation as well as
HBase .
Can you explain more ?


On Wed, Apr 30, 2014 at 5:15 PM, Jean-Marc Spaggiari <
[email protected]> wrote:

> Hi Shushant,
>
> Hive and HBase are 2 different things. You can not really use one vs
> another one.
>
> Hive is a query engine against HDFS data. Data can be stored with different
> format like flat text, sequence files, Paquet file, or even HBase table.
> HBase is both a query engine (Get and scans) and a storage engine on top of
> HDFS which allow you to store data for random read and random write.
>
> Then you can also add tools like Phoenix and Impala in the picture which
> will allow you to query the data from HDFS or HBase too.
>
> A good way to know if HBase is a good fit or not is to ask yourself how you
> are going to write into HBase or to read from HBase. HBase is good for
> Random Reads and Random Writes. If you only do bulk loads and aggregations
> (Full table scan), HBase is not a good fit. If you do random access (Client
> information, events details, etc.) HBase is a good fit.
>
> It's a bit over simplified, but that should give you some starting points.
>
>
> 2014-04-30 4:34 GMT-04:00 Shushant Arora <[email protected]>:
>
> > I have a requirement of processing huge weblogs on daily basis.
> >
> > 1. data will come incremental to datastore on daily basis and I  need
> > cumulative and daily
> > distinct user count from logs and after that aggregated data will be
> loaded
> > in RDBMS like mydql.
> >
> > 2.data will be loaded in hdfs datawarehouse on daily basis and same will
> be
> > fetched from Hdfs warehouse after some filtering in RDMS like mysql and
> > will be processed there.
> >
> > Which datawarehouse is suitable for approach 1 and 2 and why?.
> >
> > Thanks
> > Shushant
> >
>

Re: when to use hive vs hbase

Reply via email to