Re: Spark SQL API taking longer time than DF API.

chris Mon, 08 Apr 2019 01:31:29 -0700

Hi,

Without more information it’s very difficult to work out what’s going on. If 
possible can you do the following and make available to us.


1) for each query call explain() and post the output.

2) Run each query and then go to the sql tab in the spark ui. For each query 
show us the plan.

3) For each query post some screenshots of the tasks page in the spark ui.

(In all of the above make sure to redact ay sensitive information!)

You are right in thinking that that the queries should be identical. My hunch 
is that something subtle is making them not quite identical and the above 
information should allow us to figure out what.

Thanks,

Chris 



> On 8 Apr 2019, at 09:21, neeraj bhadani <[email protected]> wrote:
> 
> Hi All,
>     Can anyone help me here with my query?
> 
> Regards,
> Neeraj
> 
>> On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani <[email protected]> 
>> wrote:
>> In Both the cases, I am trying to create a HIVE table based on Union on 2 
>> same queries.
>> 
>> Not sure how internally it differs on the process of creation of HIVE table?
>> 
>> Regards,
>> Neeraj
>> 
>>> On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke <[email protected]> wrote:
>>> Is the select taking longer or the saving to a file. You seem to only save 
>>> in the second case to a file 
>>> 
>>>> Am 29.03.2019 um 15:10 schrieb neeraj bhadani 
>>>> <[email protected]>:
>>>> 
>>>> Hi Team,
>>>>    I am executing same spark code using the Spark SQL API and DataFrame 
>>>> API, however, Spark SQL is taking longer than expected.
>>>> 
>>>> PFB Sudo code.
>>>> -----------------------------------------------------------------------------------------------
>>>> Case 1 : Spark SQL
>>>> -----------------------------------------------------------------------------------------------
>>>> %sql
>>>> CREATE TABLE <tbl_name>
>>>> AS
>>>> 
>>>>  WITH <table_1> AS (
>>>>      <qry1>
>>>> )
>>>> ,<table_2> AS (
>>>>      <qry2>
>>>>      )
>>>> 
>>>> SELECT * FROM <table_1> 
>>>> UNION ALL
>>>> SELECT * FROM <table_2>
>>>> 
>>>> -----------------------------------------------------------------------------------------------
>>>> Case  2 : DataFrame API
>>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> df1 = spark.sql(<qry1>)
>>>> df2 = spark.sql(<qry2>)
>>>> df3 = df1.union(df2)
>>>> df3.write.saveAsTable(<table_name>)
>>>> -----------------------------------------------------------------------------------------------
>>>> 
>>>> As per my understanding, both Spark SQL and DtaaFrame API generate the 
>>>> same code under the hood and execution time has to be similar.
>>>> 
>>>> Regards,
>>>> Neeraj
>>>>

Re: Spark SQL API taking longer time than DF API.

Reply via email to