Re: Hive and Impala

Ashok Kumar Tue, 01 Mar 2016 13:39:43 -0800

Dr Mitch,
My two cents here.
I don't have direct experience of Impala but in my humble opinion I share your 
views that Hive provides the best metastore of all Big Data systems. Looking 
around almost every product in one form and shape use Hive code somewhere. My 
colleagues inform me that Hive is one of the most stable Big Data products.
With the capabilities of Spark on Hive and Hive on Spark or Tez plus of course 
MR, there is really little need for many other products in the same space. It 
is good to keep things simple.
Warmest


    On Tuesday, 1 March 2016, 11:33, Mich Talebzadeh 
<[email protected]> wrote:
 

 I have not heard of Impala anymore. I saw an article in LinkedIn titled
"Apache Hive Or Cloudera Impala? What is Best for me?"
"We can access all objects from Hive data warehouse with HiveQL which leverages 
the map-reduce architecture in background for data retrieval and transformation 
and this results in latency." 
My response was
This statement is no longer valid as you have choices of three engines now with 
MR, Spark and Tez. I have not used Impala myself as I don't think there is a 
need for it with Hive on Spark or Spark using Hive metastore providing whatever 
needed. Hive is for Data Warehouse and provides what is says on the tin. Please 
also bear in mind that Hive offers ORC storage files that provide store Index 
capabilities further optimizing the queries with additional stats at file, 
stripe and row group levels. 
Anyway the question is with Hive on Spark or Spark using Hive metastore what we 
cannot achieve that we can achieve with Impala?

Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com

Re: Hive and Impala

Reply via email to