@IIya Ganellin, not sure how zipWithIndex() will do less then O(n) scan. Spark doc doesnt mention anything about it.
I found solution with spark 1.5.2 OrderedRDDFunctions. It has filterByRange api. Thanks On Sun, Jan 24, 2016 at 10:27 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > One thing you can also look at is to save your data in a way that can be > accessed through file patterns. Eg by hour, zone etc so that you only load > what you need. > On Jan 24, 2016 10:00 PM, "Ilya Ganelin" <ilgan...@gmail.com> wrote: > >> The solution I normally use is to zipWithIndex() and then use the filter >> operation. Filter is an O(m) operation where m is the size of your >> partition, not an O(N) operation. >> >> -Ilya Ganelin >> >> On Sat, Jan 23, 2016 at 5:48 AM, Nirav Patel <npa...@xactlycorp.com> >> wrote: >> >>> Problem is I have RDD of about 10M rows and it keeps growing. Everytime >>> when we want to perform query and compute on subset of data we have to use >>> filter and then some aggregation. Here I know filter goes through each >>> partitions and every rows of RDD which may not be efficient at all. >>> >>> Spark having Ordered RDD functions I dont see why it's so difficult to >>> implement such function. Cassandra/Hbase has it for years where they can >>> fetch data only from certain partitions based on your rowkey. Scala TreeMap >>> has Range function to do the same. >>> >>> I think people have been looking for this for while. I see several post >>> asking this. >>> >>> >>> http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-td20170.html#a26048 >>> >>> By the way, I assume there >>> Thanks >>> Nirav >>> >>> >>> >>> >>> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> >>> >>> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] >>> <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] >>> <https://twitter.com/Xactly> [image: Facebook] >>> <https://www.facebook.com/XactlyCorp> [image: YouTube] >>> <http://www.youtube.com/xactlycorporation> >> >> >> -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>