Spark[SqL] performance tuning

2020-11-12 Thread Lakshmi Nivedita
Hi all, I have pyspark sql script with loading of one table 80mb and one is 2 mb and rest 3 are small tables performing lots of joins in the script to fetch the data. My system configuration is 4 nodes,300 GB,64 cores To write a data frame into table 24Mb size records . System is taking 4 minut

Re: [Spark SQL] does pyspark udf support spark.sql inside def

2020-10-01 Thread Lakshmi Nivedita
; > On Wednesday, September 30, 2020, Lakshmi Nivedita > wrote: > >> Thank you for the clarification.I would like to how can I proceed for >> this kind of scenario in pyspark >> >> I have a scenario subtracting the total number of days with the number of >> ho

[Spark SQL]pyspark to count total number of days-no of holidays by using sql

2020-09-30 Thread Lakshmi Nivedita
I have a table with dates date1 date2 in one table and number of holidays in another table df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1) totalnumberofdays - df2.holidays from table A; df2 = select count(holiays) from table B where holidate >= 'date1'(table A) and holidate < = date

Re: [Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Lakshmi Nivedita
er column is not a unique key On Wed, Sep 30, 2020 at 6:05 PM Sean Owen wrote: > No, you can't use the SparkSession from within a function executed by > Spark tasks. > > On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita > wrote: > >> Here is a spark udf

[Spark SQL] does pyspark udf support spark.sql inside def

2020-09-30 Thread Lakshmi Nivedita
Here is a spark udf structure as an example Def sampl_fn(x): Spark.sql(“select count(Id) from sample Where Id = x ”) Spark.udf.register(“sample_fn”, sample_fn) Spark.sql(“select id, sampl_fn(Id) from example”) Advance Thanks for the help -- k.Lakshmi Nivedita