Hi, I need to split a RDD into 3 different RDD using filter-transformation. I have cached the original RDD before using filter. The input is lopsided leaving some executors with heavy load while others with less; so I have repartitioned it.
*DAG-lineage I expected:* I/P RDD --> MAP RDD --> SHUFFLE RDD (repartition) --> *MAP RDD (cache)* --> FILTER RDD1 --> MAP1 --> UNION RDD --> O/P RDD --> FILTER RDD2 --> MAP2 --> FILTER RDD3 --> MAP3 *DAG-lineage I observed:* I/P RDD --> MAP RDD --> SHUFFLE RDD (repartition) --> *MAP RDD (cache)* --> FILTER RDD1 --> MAP1 SHUFFLE RDD (repartition) --> *MAP RDD (cache)* --> FILTER RDD2 --> MAP2 SHUFFLE RDD (repartition) --> *MAP RDD (cache)* --> FILTER RDD3 --> MAP3 --> UNION RDD --> O/P RDD Also I Spark-UI shows that no RDD partitioned are actually being cached. How do I split then without shuffling thrice? Regards, Sushrut Ikhar [image: https://]about.me/sushrutikhar <https://about.me/sushrutikhar?promo=email_sig>