Hi, 

 

Can anyone clarify how repartition works please ?

 

*       I have a DataFrame df which has only one partition:

 

// Returns 1
df.rdd.getNumPartitions



*         I repartitioned it by passing "3" and assigned it a new DataFrame
newdf


val newdf = df.repartition(3)



 

*         newdf shows 3 as number of partitions
// Returns 3
newdf.rdd.getNumPartitions



*         df still shows 1


// Return 1
df.rdd.getNumPartitions

 

My Question is that, 

 

1.      How does repartition work ? Does it copy original dataframe and
create X partitions as specified by  repartition ?  If that is the case,
aren't there two copies of same data in memory as shown in below diagram ?

Or my understanding is incorrect ?  

 

As per executions above,  looks like there are two copies as after
repartition, df still has 1 partition !!

 

2.      Repartition is executed immediately or it waits for some trigger
[kind of action] ?

 

 



 

Thanks,

Ravi

Reply via email to