Re: How to collect/take arbitrary number of records in the driver?

2016-02-10 Thread Jakob Odersky
Another alternative: rdd.take(1000).drop(100) //this also preserves ordering Note however that this can lead to an OOM if the data you're taking is too large. If you want to perform some operation sequentially on your driver and don't care about performance, you could do something similar as

How to collect/take arbitrary number of records in the driver?

2016-02-09 Thread SRK
Hi , How to get a fixed amount of records from an RDD in Driver? Suppose I want the records from 100 to 1000 and then save them to some external database, I know that I can do it from Workers in partition but I want to avoid that for some reasons. The idea is to collect the data to driver and

RE: How to collect/take arbitrary number of records in the driver?

2016-02-09 Thread Mohammed Guller
You can do something like this: val indexedRDD = rdd.zipWithIndex val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) && (index < 199)} val result = filteredRDD.take(100) Warning: the ordering of the elements in the RDD is not guaranteed. Mohammed Author: Big Data