Hi Ravi, RDDs are always immutable, so you cannot change them, instead you create new ones by transforming one. Repartition is a transformation, so it lazily evaluated, hence computed only when you call an action on it.
Thanks. Vamshi Talla On Jul 8, 2018, at 12:26 PM, <[email protected]<mailto:[email protected]>> <[email protected]<mailto:[email protected]>> wrote: Hi, Can anyone clarify how repartition works please ? * I have a DataFrame df which has only one partition: // Returns 1 df.rdd.getNumPartitions • I repartitioned it by passing “3” and assigned it a new DataFrame newdf val newdf = df.repartition(3) • newdf shows 3 as number of partitions // Returns 3 newdf.rdd.getNumPartitions • df still shows 1 // Return 1 df.rdd.getNumPartitions My Question is that, 1. How does repartition work ? Does it copy original dataframe and create X partitions as specified by repartition ? If that is the case, aren’t there two copies of same data in memory as shown in below diagram ? Or my understanding is incorrect ? As per executions above, looks like there are two copies as after repartition, df still has 1 partition !! 1. Repartition is executed immediately or it waits for some trigger [kind of action] ? <image001.png> Thanks, Ravi
