While I've not experimented with the most recent versions of SparkSQL, earlier 
releases could not cope with intermediate result sets that exceeded the 
available memory; Hive handles this sort of situation much more gracefully.  If 
you have a smallish cluster and large data, this could pose a problem.  Still, 
it's worth looking into SparkSQL to see if this is still an issue.

-Chris Dragga

From: Uli Bethke [mailto:uli.bet...@sonra.io]
Sent: Wednesday, May 20, 2015 7:04 AM
To: user@hive.apache.org
Subject: Re: Hive on Spark VS Spark SQL

Interesting question and one that I have asked myself. If you are already 
heavily invested in the Hive ecosystem in terms of code and skills I would look 
at Hive on Spark as my engine. In theory swapping out engines (MR, TEZ, Spark) 
should be easy. Even though the devil is in the detail.
SparkSQL supports a broad subset of HiveQL (some esoteric features are not 
supported). Crucially in my opinion SparkSQL 1.4 will also introduce windowing 
functions. If starting out on a greenfield site I would exclusively look at 
SparkSQL.

On 20/05/2015 06:38, guoqing0...@yahoo.com.hk<mailto:guoqing0...@yahoo.com.hk> 
wrote:
Hive on Spark and SparkSQL which should be better , and what are the key 
characteristics and the advantages and the disadvantages between ?

________________________________
guoqing0...@yahoo.com.hk<mailto:guoqing0...@yahoo.com.hk>



--

___________________________

Uli Bethke

Co-founder Sonra

p: +353 86 32 83 040

w: www.sonra.io<http://www.sonra.io>

l: linkedin.com/in/ulibethke

t: twitter.com/ubethke



Chair Hadoop User Group Ireland:

http://www.meetup.com/hadoop-user-group-ireland/

Reply via email to