While I've not experimented with the most recent versions of SparkSQL, earlier releases could not cope with intermediate result sets that exceeded the available memory; Hive handles this sort of situation much more gracefully. If you have a smallish cluster and large data, this could pose a problem. Still, it's worth looking into SparkSQL to see if this is still an issue.
-Chris Dragga From: Uli Bethke [mailto:uli.bet...@sonra.io] Sent: Wednesday, May 20, 2015 7:04 AM To: user@hive.apache.org Subject: Re: Hive on Spark VS Spark SQL Interesting question and one that I have asked myself. If you are already heavily invested in the Hive ecosystem in terms of code and skills I would look at Hive on Spark as my engine. In theory swapping out engines (MR, TEZ, Spark) should be easy. Even though the devil is in the detail. SparkSQL supports a broad subset of HiveQL (some esoteric features are not supported). Crucially in my opinion SparkSQL 1.4 will also introduce windowing functions. If starting out on a greenfield site I would exclusively look at SparkSQL. On 20/05/2015 06:38, guoqing0...@yahoo.com.hk<mailto:guoqing0...@yahoo.com.hk> wrote: Hive on Spark and SparkSQL which should be better , and what are the key characteristics and the advantages and the disadvantages between ? ________________________________ guoqing0...@yahoo.com.hk<mailto:guoqing0...@yahoo.com.hk> -- ___________________________ Uli Bethke Co-founder Sonra p: +353 86 32 83 040 w: www.sonra.io<http://www.sonra.io> l: linkedin.com/in/ulibethke t: twitter.com/ubethke Chair Hadoop User Group Ireland: http://www.meetup.com/hadoop-user-group-ireland/