Fwd: How to minimize shuffling on Spark dataframe Join?

2015-08-11 Thread Abdullah Anwar
I have two dataframes like this student_rdf = (studentid, name, ...) student_result_rdf = (studentid, gpa, ...) we need to join this two dataframes. we are now doing like this, student_rdf.join(student_result_rdf, student_result_rdf["studentid"] == student_rdf["studentid"]) So it is simple.

Re: How to minimize shuffling on Spark dataframe Join?

2015-08-12 Thread Abdullah Anwar
is - how to co-group data from two > dataframes based on a key? I think for RDD's cogroup in PairRDDFunctions is > a way. I am not sure if something similar is available for DataFrames. > > Hemant > > > > > > On Tue, Aug 11, 2015 at 2:14 PM, Abdullah Anwar < > abd