Depending on what you do with them, they will get computed separately. Bcoz u may have long dag in each branch. So spark tries to run all the transformation function together rather than trying to optimize things across branches. On Jul 16, 2015 1:40 PM, "Bin Wang" <wbi...@gmail.com> wrote:
> What if I would use both rdd1 and rdd2 later? > > Raghavendra Pandey <raghavendra.pan...@gmail.com>于2015年7月16日周四 下午4:08写道: > >> If you cache rdd it will save some operations. But anyway filter is a >> lazy operation. And it runs based on what you will do later on with rdd1 >> and rdd2... >> >> Raghavendra >> On Jul 16, 2015 1:33 PM, "Bin Wang" <wbi...@gmail.com> wrote: >> >>> If I write code like this: >>> >>> val rdd = input.map(_.value) >>> val f1 = rdd.filter(_ == 1) >>> val f2 = rdd.filter(_ == 2) >>> ... >>> >>> Then the DAG of the execution may be this: >>> >>> -> Filter -> ... >>> Map >>> -> Filter -> ... >>> >>> But the two filters is operated on the same RDD, which means it could be >>> done by just scan the RDD once. Does spark have this kind optimization for >>> now? >>> >>