My 2 cents is that this is a complicated question since I'm not confident
that Spark is 100% compatible with Hive in terms of query language. I have
an unanswered question in this list about this:

http://apache-spark-user-list.1001560.n3.nabble.com/Should-SHOW-TABLES-statement-return-a-hive-compatible-output-td38577.html

One thing that is important to check is if you are using the supported
objects in both Hive and Spark. One example is the lack of support for
materialized views in Spark:
https://issues.apache.org/jira/browse/SPARK-29038

With that being said, I'd recommend going to 2. as this will force your
code to use that Spark offers.

Hope that helps.

On Tue, Oct 6, 2020 at 1:14 PM Manu Jacob <manu.ja...@sas.com.invalid>
wrote:

> Hi All,
>
>
>
> Not sure if I need to ask this question on spark community or hive
> community.
>
>
>
> We have a set of hive scripts that runs on EMR (Tez engine). We would like
> to experiment by moving some of it onto Spark. We are planning to
> experiment with two options.
>
>
>
>    1. Use the current code based on HQL, with engine set as spark.
>    2. Write pure spark code in scala/python using SparkQL and hive
>    integration.
>
>
>
> The first approach helps us to transition to Spark quickly but not sure if
> this is the best approach in terms of performance.  Could not find any
> reasonable comparisons of this two approaches.  It looks like writing pure
> Spark code, gives us more control to add logic and also control some of the
> performance features, for example things like caching/evicting etc.
>
>
>
>
>
> Any advice on this is much appreciated.
>
>
>
>
>
> Thanks,
>
> -Manu
>
>
>


-- 

Ricardo Martinelli De Oliveira

Data Engineer, AI CoE

Red Hat Brazil <https://www.redhat.com/>

Av. Brigadeiro Faria Lima, 3900

8th floor

rmart...@redhat.com    T: +551135426125
M: +5511970696531
@redhatjobs <https://twitter.com/redhatjobs>   redhatjobs
<https://www.facebook.com/redhatjobs> @redhatjobs
<https://instagram.com/redhatjobs>
<https://www.redhat.com/>

Reply via email to