qry_1 and qry_2 are simple select query with groupBy clause. Are there any specific queries which works in a different way for Spark SQL and DataFrame API?
Regards, Neeraj On Sat, Mar 30, 2019 at 7:27 PM Jason Nerothin <jasonnerot...@gmail.com> wrote: > Can you please quantify the difference and provide the query code? > > On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani < > bhadani.neeraj...@gmail.com> wrote: > >> Hi Team, >> I am executing same spark code using the Spark SQL API and DataFrame >> API, however, Spark SQL is taking longer than expected. >> >> PFB Sudo code. >> >> ----------------------------------------------------------------------------------------------- >> >> Case 1 : Spark SQL >> >> >> ----------------------------------------------------------------------------------------------- >> >> %sql >> >> CREATE TABLE <tbl_name> >> >> AS >> >> >> WITH <table_1> AS ( >> >> <qry1> >> >> ) >> >> ,<table_2> AS ( >> >> <qry2> >> >> ) >> >> >> SELECT * FROM <table_1> >> >> UNION ALL >> >> SELECT * FROM <table_2> >> >> >> >> ----------------------------------------------------------------------------------------------- >> >> Case 2 : DataFrame API >> >> >> ----------------------------------------------------------------------------------------------- >> >> >> df1 = spark.sql(<qry1>) >> >> df2 = spark.sql(<qry2>) >> >> df3 = df1.union(df2) >> >> df3.write.saveAsTable(<table_name>) >> >> >> ----------------------------------------------------------------------------------------------- >> >> >> As per my understanding, both Spark SQL and DtaaFrame API generate the >> same code under the hood and execution time has to be similar. >> >> >> Regards, >> >> Neeraj >> >> >> > > -- > Thanks, > Jason >