Element-wise: that sounds like a sequential control flow whereas RDDs
are inherently parallel collections.  I'm also interested to know if
it's possible.

Partition-wise: PartitionPruningRDD [1] may be of help.

[1] 
http://spark.incubator.apache.org/docs/0.8.0/api/core/org/apache/spark/rdd/PartitionPruningRDD.html

On Sun, Nov 3, 2013 at 10:42 PM, Xiang Huo <huoxiang5...@gmail.com> wrote:
> Hi all,
>
> I am trying to filter a smaller RDD data set from a large RDD data set. And
> the large one is sorted. So my question is that is there any way to make the
> filter method does't check every element in RDD but filter out all the other
> elements when one element doesn't meet the condition of filter. Because the
> large data set is sorted, when there is one element doesn't meet the
> requirement, all the following elements are impossible to meet. But checking
> them one by one will take a relative long time.
> So is there any way to save time for this part?
>
> Thanks,
>
> Xiang
>
> --
> Xiang Huo
> Department of Computer Science
> University of Illinois at Chicago(UIC)
> Chicago, Illinois
> US
> Email: huoxiang5...@gmail.com
>            or xh...@uic.edu

Reply via email to