Hi all,
I have pyspark sql script with loading of one table 80mb and one is 2 mb
and rest 3 are small tables performing lots of joins in the script to fetch
the data.
My system configuration is
4 nodes,300 GB,64 cores
To write a data frame into table 24Mb size records . System is taking 4
minut
;
> On Wednesday, September 30, 2020, Lakshmi Nivedita
> wrote:
>
>> Thank you for the clarification.I would like to how can I proceed for
>> this kind of scenario in pyspark
>>
>> I have a scenario subtracting the total number of days with the number of
>> ho
I have a table with dates date1 date2 in one table and number of holidays
in another table
df1 = select date1,date2 ,ctry ,unixtimestamp(date2-date1)
totalnumberofdays - df2.holidays from table A;
df2 = select count(holiays)
from table B
where holidate >= 'date1'(table A)
and holidate < = date
er column is not a unique key
On Wed, Sep 30, 2020 at 6:05 PM Sean Owen wrote:
> No, you can't use the SparkSession from within a function executed by
> Spark tasks.
>
> On Wed, Sep 30, 2020 at 7:29 AM Lakshmi Nivedita
> wrote:
>
>> Here is a spark udf
Here is a spark udf structure as an example
Def sampl_fn(x):
Spark.sql(“select count(Id) from sample Where Id = x ”)
Spark.udf.register(“sample_fn”, sample_fn)
Spark.sql(“select id, sampl_fn(Id) from example”)
Advance Thanks for the help
--
k.Lakshmi Nivedita