Thanks a lot Roman. But provided link as several ways to deal the problem. Why do we need to do operation on RDD instead dataframe/dataset ?
Do I need a custom partitioner in my case , how to invoke it in spark-sql? Can anyone provide some sample on handling skewed data with spark-sql? Thanks, Shyam >