Re: Hive on Spark VS Spark SQL

2015-05-22 Thread Xuefu Zhang
Hi Cheolsoo, Thanks for the correction. I took that for granted and didn't actually check the code to verify. Yes, from the Spark version (1.2), I did see their parser etc. Below is a portion of the README from Spark's sql package for reference. Thanks, Xuefu Spark SQL is broken up into four sub

Re: Hive on Spark VS Spark SQL

2015-05-21 Thread Cheolsoo Park
Hi Xuefu, Thanks for the good comparison. I agree with most points, but #1 isn't true. SparkSQL has its own parser (implemented with Scala parser combinator library), analyzer, and optimizer although they're not as mature as Hive. What it depends on Hive for is Metastore, CliDriver, DDL parser, e

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Alexander Pivovarov
Thank you Xuefu! Excellent explanation and comparison! We should put it to Hive on Spark wiki. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark On Wed, May 20, 2015 at 10:45 AM, Xuefu Zhang wrote: > I have been working on HIve on Spark, and knows a little about SparkSQL. > Here a

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Xuefu Zhang
I have been working on HIve on Spark, and knows a little about SparkSQL. Here are a few factors to be considered: 1. SparkSQL is similar to Shark (discontinued) in that it clones Hive's front end (parser and semantic analyzer) and metastore, and inject in between a laryer where Hive's operator tre

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Edward Capriolo
ed >>> the available memory; Hive handles this sort of situation much more >>> gracefully. If you have a smallish cluster and large data, this could pose >>> a problem. Still, it’s worth looking into SparkSQL to see if this is still >>> an issue. >>&g

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread matshyeq
t; a problem. Still, it’s worth looking into SparkSQL to see if this is still >> an issue. >> >> >> >> -Chris Dragga >> >> >> >> *From:* Uli Bethke [mailto:uli.bet...@sonra.io] >> *Sent:* Wednesday, May 20, 2015 7:04 AM >> *To:* user@hive.apa

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Edward Capriolo
ose > a problem. Still, it’s worth looking into SparkSQL to see if this is still > an issue. > > > > -Chris Dragga > > > > *From:* Uli Bethke [mailto:uli.bet...@sonra.io] > *Sent:* Wednesday, May 20, 2015 7:04 AM > *To:* user@hive.apache.org > *Subject:* Re:

RE: Hive on Spark VS Spark SQL

2015-05-20 Thread Dragga, Christopher
ose a problem. Still, it's worth looking into SparkSQL to see if this is still an issue. -Chris Dragga From: Uli Bethke [mailto:uli.bet...@sonra.io] Sent: Wednesday, May 20, 2015 7:04 AM To: user@hive.apache.org Subject: Re: Hive on Spark VS Spark SQL Interesting question and one that I

Re: Hive on Spark VS Spark SQL

2015-05-20 Thread Uli Bethke
Interesting question and one that I have asked myself. If you are already heavily invested in the Hive ecosystem in terms of code and skills I would look at Hive on Spark as my engine. In theory swapping out engines (MR, TEZ, Spark) should be easy. Even though the devil is in the detail. SparkS