Serialization issue when using Spark3.1.2 with hadoop yarn

2021-10-03 Thread davvy benny
My spark Job fails with this error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (davben-lubuntu executor 2): java.lang.ClassCastException: cannot assign instance of java.lang.invok

[Spark] Optimize spark join on different keys for same data frame

2021-10-03 Thread Amit Joshi
Hi Spark-Users, Hope you are doing good. I have been working on cases where a dataframe is joined with more than one data frame separately, on different cols, that too frequently. I was wondering how to optimize the join to make them faster. We can consider the dataset to be big in size so broadc