Re: Hive footprint

Naveen Gangam Mon, 25 Apr 2016 11:29:25 -0700

Hi Mich,
I am a developer at Cloudera and contribute to Apache Hive.

Hive and MPP query engine projects like Impala have settled into their
respective positions so there is less confusion between these projects.


For example, across Cloudera's customer base the majority of customers use
Impala to enable them to perform BI and SQL analytics directly on Hadoop.
Most Impala users are using Hive for the data preparation of the data sets
they're serving up via Impala. As such Impala typically competes with
traditional analytic databases where customers decide between:
    * Using Hadoop and Hive for data processing that feeds into another
database or BI layer for the analytics
    * Unified architecture where they directly serve some sets of BI and
analytics from Hadoop via Impala while typically using Hive, Spark,
MapReduce, etc for their data preparation
You can see nearly all Hadoop distributions provide users with Hive for
core data processing plus an MPP query engine for BI and SQL analytics like
Impala, Drill, BigSQL, etc. Even Facebook who created and still heavily
uses Hive, also uses Presto internally as their MPP query engine for BI.

For more details you can see Cloudera's SQL-on-Hadoop webinar that talks
about when to use Hive, Impala, and Spark (SQL)
<http://www.cloudera.com/resources/recordedwebinar/hive-impala-and-spark-oh-my-sql-on-Hadoop-in-cloudera-5-5.html>


Support for local variables and stored procedures in Hive is included in
HPL/SQL module of Hive 2.0. However, this is an experimental feature. We
will evaluate it for production-readiness before including it in CDH Hive.

Finally, HBase is typically not the best storage manager for migrations
from commercial DWs to Big Data. Most commercial DW migrations use HDFS
rather than HBase as the storage manager.

Hope this helps.

Thank you
Naveen

On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> I notice that Impala is rarely mentioned these days.  I may be missing
> something. However, I gather it is coming to end now as I don't recall many
> use cases for it (or customers asking for it). In contrast, Hive has hold
> its ground with the new addition of Spark and Tez as execution engines,
> support for ACID and ORC and new stuff in Hive 2. In addition provided a
> good choice for its metastore it scales well.
>
> If Hive had the ability (organic) to have local variable and stored
> procedure support then it would be top notch Data Warehouse. Given its
> metastore, I don't see any technical reason why it cannot support these
> constructs.
>
> I was recently asked to comment on migration from commercial DWs to Big
> Data (primarily for TCO reason) and really could not recall any better
> candidate than Hive. Is HBase a viable alternative? Obviously whatever one
> decides there is still HDFS, a good engine for Hive (sounds like many
> prefer TEZ although I am a Spark fan) and the ubiquitous YARN.
>
> Let me know your thoughts.
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Re: Hive footprint

Reply via email to