Re: How to minimize shuffling on Spark dataframe Join?

2015-08-12 Thread Abdullah Anwar
from two dataframes based on a key? I think for RDD's cogroup in PairRDDFunctions is a way. I am not sure if something similar is available for DataFrames. Hemant On Tue, Aug 11, 2015 at 2:14 PM, Abdullah Anwar abdullah.ibn.an...@gmail.com wrote: I have two dataframes like

Fwd: How to minimize shuffling on Spark dataframe Join?

2015-08-11 Thread Abdullah Anwar
I have two dataframes like this student_rdf = (studentid, name, ...) student_result_rdf = (studentid, gpa, ...) we need to join this two dataframes. we are now doing like this, student_rdf.join(student_result_rdf, student_result_rdf[studentid] == student_rdf[studentid]) So it is simple.