Another alternative:
rdd.take(1000).drop(100) //this also preserves ordering
Note however that this can lead to an OOM if the data you're taking is
too large. If you want to perform some operation sequentially on your
driver and don't care about performance, you could do something
similar as
Hi ,
How to get a fixed amount of records from an RDD in Driver? Suppose I want
the records from 100 to 1000 and then save them to some external database, I
know that I can do it from Workers in partition but I want to avoid that for
some reasons. The idea is to collect the data to driver and
You can do something like this:
val indexedRDD = rdd.zipWithIndex
val filteredRDD = indexedRDD.filter{case(element, index) => (index >= 99) &&
(index < 199)}
val result = filteredRDD.take(100)
Warning: the ordering of the elements in the RDD is not guaranteed.
Mohammed
Author: Big Data