Hi, we are using RDD#mapPartitions() to achieve the same. Are there advantages/disadvantages of using one method over the other?
Regards, Jeff 2015-02-26 20:02 GMT+01:00 Mark Hamstra <m...@clearstorydata.com>: > rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be > pipelined into a single stage, so there generally isn't any need to > complect the map and filter into a single function. > > Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit > arg0: ClassTag[U]): RDD[U], which only applies the partial function to > those elements of the RDD for which f is defined. > > On Thu, Feb 26, 2015 at 10:49 AM, Crystal Xing <crystalxin...@gmail.com> > wrote: > >> I see. >> The reason we can use flatmap to map to null but not using map to map to >> null is because >> flatmap supports map to zero and more but map only support 1-1 mapping? >> >> It seems Flatmap is more equivalent to haddop's map. >> >> >> Thanks, >> >> Zheng zhen >> >> On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen <so...@cloudera.com> wrote: >> >>> You can flatMap: >>> >>> rdd.flatMap { in => >>> if (condition(in)) { >>> Some(transformation(in)) >>> } else { >>> None >>> } >>> } >>> >>> On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing <crystalxin...@gmail.com> >>> wrote: >>> > Hi, >>> > I have a text file input and I want to parse line by line and map each >>> line >>> > to another format. But at the same time, I want to filter out some >>> lines I >>> > do not need. >>> > >>> > I wonder if there is a way to filter out those lines in the map >>> function. >>> > >>> > Do I have to do two steps filter and map? In that way, I have to scan >>> and >>> > parse the lines twice in order to filter and map. >>> > >>> > If I map those unwanted line to null and filter out null, will that >>> work? >>> > never tried yet. >>> > >>> > Thanks, >>> > >>> > Zheng zheng >>> >> >> >