Hi Ravi,

RDDs are always immutable, so you cannot change them, instead you create new 
ones by transforming one. Repartition is a transformation, so it lazily 
evaluated, hence computed only when you call an action on it.

Thanks.
Vamshi Talla

On Jul 8, 2018, at 12:26 PM, <[email protected]<mailto:[email protected]>> 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

Can anyone clarify how repartition works please ?


  *   I have a DataFrame df which has only one partition:


// Returns 1
df.rdd.getNumPartitions

•         I repartitioned it by passing “3” and assigned it a new DataFrame 
newdf

val newdf = df.repartition(3)


•         newdf shows 3 as number of partitions
// Returns 3
newdf.rdd.getNumPartitions

•         df still shows 1

// Return 1
df.rdd.getNumPartitions

My Question is that,


  1.  How does repartition work ? Does it copy original dataframe and create X 
partitions as specified by  repartition ?  If that is the case, aren’t there 
two copies of same data in memory as shown in below diagram ?

Or my understanding is incorrect ?

As per executions above,  looks like there are two copies as after repartition, 
df still has 1 partition !!


  1.  Repartition is executed immediately or it waits for some trigger [kind of 
action] ?



<image001.png>

Thanks,
Ravi

Reply via email to