You could short-circuit the filtering within the interator function supplied to mapPartitions.
On Sunday, November 3, 2013, Xiang Huo wrote: > Hi all, > > I am trying to filter a smaller RDD data set from a large RDD data set. > And the large one is sorted. So my question is that is there any way to > make the filter method does't check every element in RDD but filter out all > the other elements when one element doesn't meet the condition of filter. > Because the large data set is sorted, when there is one element doesn't > meet the requirement, all the following elements are impossible to meet. > But checking them one by one will take a relative long time. > So is there any way to save time for this part? > > Thanks, > > Xiang > > -- > Xiang Huo > Department of Computer Science > University of Illinois at Chicago(UIC) > Chicago, Illinois > US > Email: huoxiang5...@gmail.com <javascript:_e({}, 'cvml', > 'huoxiang5...@gmail.com');> > or xh...@uic.edu <javascript:_e({}, 'cvml', 'xh...@uic.edu');> >