On Tue, Mar 28, 2017 at 2:57 PM, shyla deshpande <deshpandesh...@gmail.com> wrote:
> Following are my questions. Thank you. > > 1. When joining dataframes is it a good idea to repartition on the key column > that is used in the join or > the optimizer is too smart so forget it. > > 2. In RDD join, wherever possible we do reduceByKey before the join to avoid > a big shuffle of data. Do we need > to do anything similar with dataframe joins, or the optimizer is too smart so > forget it. > >