Re: Spark SQL, dataframe join questions.

shyla deshpande Wed, 29 Mar 2017 09:33:54 -0700

On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpandesh...@gmail.com>
wrote:


> Following are my questions. Thank you.
>
> 1. When joining dataframes is it a good idea to repartition on the key column 
> that is used in the join or
> the optimizer is too smart so forget it.
>
> 2. In RDD join, wherever possible we do reduceByKey before the join to avoid 
> a big shuffle of data. Do we need
> to do anything similar with dataframe joins, or the optimizer is too smart so 
> forget it.
>
>

Re: Spark SQL, dataframe join questions.

Reply via email to