toDF internally calls sqlcontext.createDataFrame which transforms the RDD to RDD[InternalRow]. This RDD[InternalRow] is then mapped to a dataframe.
Type conversions (from scala types to catalyst types) are involved but no shuffling. Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> www.snappydata.io On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang <wzhan...@dons.usfca.edu> wrote: > Hi there, > > Could someone explain to me what is behind the scene of rdd.toDF()? More > importantly, will this step involve a lot of shuffles and cause the surge > of the size of intermediate files? Thank you. > > Best Regards, > Vivian >