Element-wise: that sounds like a sequential control flow whereas RDDs are inherently parallel collections. I'm also interested to know if it's possible.
Partition-wise: PartitionPruningRDD [1] may be of help. [1] http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/PartitionPruningRDD.html On Sun, Nov 3, 2013 at 10:42 PM, Xiang Huo <huoxiang5...@gmail.com> wrote: > Hi all, > > I am trying to filter a smaller RDD data set from a large RDD data set. And > the large one is sorted. So my question is that is there any way to make the > filter method does't check every element in RDD but filter out all the other > elements when one element doesn't meet the condition of filter. Because the > large data set is sorted, when there is one element doesn't meet the > requirement, all the following elements are impossible to meet. But checking > them one by one will take a relative long time. > So is there any way to save time for this part? > > Thanks, > > Xiang > > -- > Xiang Huo > Department of Computer Science > University of Illinois at Chicago(UIC) > Chicago, Illinois > US > Email: huoxiang5...@gmail.com > or xh...@uic.edu