Hi,
Can anyone clarify how repartition works please ? * I have a DataFrame df which has only one partition: // Returns 1 df.rdd.getNumPartitions * I repartitioned it by passing "3" and assigned it a new DataFrame newdf val newdf = df.repartition(3) * newdf shows 3 as number of partitions // Returns 3 newdf.rdd.getNumPartitions * df still shows 1 // Return 1 df.rdd.getNumPartitions My Question is that, 1. How does repartition work ? Does it copy original dataframe and create X partitions as specified by repartition ? If that is the case, aren't there two copies of same data in memory as shown in below diagram ? Or my understanding is incorrect ? As per executions above, looks like there are two copies as after repartition, df still has 1 partition !! 2. Repartition is executed immediately or it waits for some trigger [kind of action] ? Thanks, Ravi