Hi,
we are using RDD#mapPartitions() to achieve the same.

Are there advantages/disadvantages of using one method over the other?

Regards,
Jeff

2015-02-26 20:02 GMT+01:00 Mark Hamstra <m...@clearstorydata.com>:

> rdd.map(foo).filter(bar) and rdd.filter(bar).map(foo) will each already be
> pipelined into a single stage, so there generally isn't any need to
> complect the map and filter into a single function.
>
> Additionally, there is RDD#collect[U](f: PartialFunction[T, U])(implicit
> arg0: ClassTag[U]): RDD[U], which only applies the partial function to
> those elements of the RDD for which f is defined.
>
> On Thu, Feb 26, 2015 at 10:49 AM, Crystal Xing <crystalxin...@gmail.com>
> wrote:
>
>> I see.
>> The reason we can use flatmap to map to null but not using map to map to
>> null is because
>> flatmap supports map to zero and more  but map only support 1-1 mapping?
>>
>> It seems Flatmap is more equivalent to haddop's map.
>>
>>
>> Thanks,
>>
>> Zheng zhen
>>
>> On Thu, Feb 26, 2015 at 10:44 AM, Sean Owen <so...@cloudera.com> wrote:
>>
>>> You can flatMap:
>>>
>>> rdd.flatMap { in =>
>>>   if (condition(in)) {
>>>     Some(transformation(in))
>>>   } else {
>>>     None
>>>   }
>>> }
>>>
>>> On Thu, Feb 26, 2015 at 6:39 PM, Crystal Xing <crystalxin...@gmail.com>
>>> wrote:
>>> > Hi,
>>> > I have a text file input and I want to parse line by line and map each
>>> line
>>> > to another format. But at the same time, I want to filter out some
>>> lines I
>>> > do not need.
>>> >
>>> > I wonder if there is a way to filter out those lines in the map
>>> function.
>>> >
>>> > Do I have to do two steps filter and map?  In that way, I have to scan
>>> and
>>> > parse the lines twice in order to filter and map.
>>> >
>>> > If I map those unwanted line to null and filter out null, will that
>>> work?
>>> > never tried yet.
>>> >
>>> > Thanks,
>>> >
>>> > Zheng zheng
>>>
>>
>>
>

Reply via email to