Re: Hive and Impala

2016-03-02 Thread Jörn Franke
It always depends on what you want to do and thus from experience I cannot agree with your comment. Do you have any reasoning for this statement? > On 02 Mar 2016, at 19:14, Dayong wrote: > > Tez is kind of outdated and Orc is so dedicated on hive. In addition, hive > metadata store can be de

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
OK two questions here please: 1. Which version of Hive are you running 2. Have you tried Hive on Spark which does both DAG & In-memory calculation. Query Hive on Spark job[1] stages: INFO : 2 INFO : 3 HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=

Re: Hive and Impala

2016-03-02 Thread Dayong
Tez is kind of outdated and Orc is so dedicated on hive. In addition, hive metadata store can be decoupled from hive as well. In reality, we do suffer from hive's performance even for ETL job. As result, we'll switch to implala + spark/ flink. Thanks, Dayong > On Mar 2, 2016, at 10:35 AM, Mic

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
I forgot besides LLAP you are going to have Hive Hybrid Procedural SQL On Hadoop (HPL/SQL) which is going to add another dimension to Hive Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCC

Re: Hive and Impala

2016-03-02 Thread Mich Talebzadeh
SQL plays an increasing important role on Hadoop. As of today Hive IMO provides the best and most robust solution to anything resembling to Data Warehouse "solution" on Hadoop, chiefly by means of its powerful metastore which can be hosted on a variety of mission critical databases plus Hive's ever

Re: Hive and Impala

2016-03-02 Thread Jörn Franke
I think you can always make a benchmark that has this and this result. You always have to see what is evaluated and generally I recommend to always try yourself for your data and your queries. There is also a lot of change within the projects. Impala may have Kudo, but Hive has ORC, Tez and Spa

Re: Hive and Impala

2016-03-02 Thread Dayong
As I remember of few weeks before in Hadoop weekly news feed, cloudera has a benchmark showing implala is a little better than spark SQL and hive with tez. You can check that. From my experience, hive is still leading tool for regular ETL job since it is stable. The other tool are better for adh

Re: Hive and Impala

2016-03-01 Thread Edward Capriolo
My nocks on impala. (not intended to be a post knocking impala) Impala really has not delivered on the complex types that hive has (after promising it for quite a while), also it only works with the 'blessed' input formats, parquet, avro, text. It is very annoying to work with impala, In my versi

Re: Hive and Impala

2016-03-01 Thread Ashok Kumar
Dr Mitch, My two cents here. I don't have direct experience of Impala but in my humble opinion I share your views that Hive provides the best metastore of all Big Data systems. Looking around almost every product in one form and shape use Hive code somewhere. My colleagues inform me that Hive i

Re: Hive and Impala

2016-03-01 Thread Mich Talebzadeh
Just to clarify the statement in quotes was made by the author of the article "We can access all objects from Hive data warehouse with HiveQL which leverages the map-reduce architecture in background for data retrieval and transformation and this results in latency." Dr Mich Talebzadeh LinkedI

Hive and Impala

2016-03-01 Thread Mich Talebzadeh
I have not heard of Impala anymore. I saw an article in LinkedIn titled "Apache Hive Or Cloudera Impala? What is Best for me?" "We can access all objects from Hive data warehouse with HiveQL which leverages the map-reduce architecture in background for data retrieval and transformation and this r

RE: Hive and Impala

2015-04-27 Thread Mich Talebzadeh
esponsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Moore, Douglas [mailto:douglas.mo...@thinkbiganalytics.com] Sent: 27 April 2015 14:10 To: user@hive.apache.org Subject: R

Re: Hive and Impala

2015-04-27 Thread Moore, Douglas
Hive is great for massive transformations needed in ETL type processing and full data set analytics. Impala is better suited for fast analytical queries returning a tiny subset of the original data set. Both are improving in terms of concurrency and latency however they have a long ways to go to

Re: Hive and Impala

2015-04-27 Thread Anilkumar Kalshetti
Hi Ashok, Also Now you can use spark as execution Engine for Hive. Please check HiveOnSpark[HoS] Project. Ref Link . Thanks On 27 April 2015 at 15:22, Fabio C. wrote: > If the comparison mention just MR, then

Re: Hive and Impala

2015-04-27 Thread Fabio C.
If the comparison mention just MR, then is probably outdated. Hive can now run on Tez with a great improvement in performance. However I don't know about Hive+Tez vs Impala. On Mon, Apr 27, 2015 at 10:50 AM, Nitin Pawar wrote: > What use case are you trying to solve? > > On Mon, Apr 27, 2015 at

Re: Hive and Impala

2015-04-27 Thread Nitin Pawar
What use case are you trying to solve? On Mon, Apr 27, 2015 at 2:16 PM, Ashok Kumar wrote: > Hi gurus, > > Kindly help me understand the advantage that Impala has over Hive. > > I read a note that Impala does not use MapReduce engine and is therefore > very fast for queries compared to Hive. How

Hive and Impala

2015-04-27 Thread Ashok Kumar
Hi gurus, Kindly help me understand the advantage that Impala has over Hive. I read a note that Impala does not use MapReduce engine and is therefore very fast for queries compared to Hive. However, Hive as I understand is widely used everywhere! Thank you