If the first partition doesn't have enough records, then it may not drop enough lines. Try
rddData.zipWithIndex().filter(_._2 >= 10L).map(_._1) It might trigger a job. Best, Xiangrui On Wed, Apr 23, 2014 at 9:46 AM, DB Tsai <dbt...@stanford.edu> wrote: > Hi Chengi, > > If you just want to skip first n lines in RDD, you can do > > rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: Iterator[String]) > => { > if (partitionIdx == 0) { > lines.drop(n) > } > lines > } > > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com> wrote: >> >> Hi, >> What is the easiest way to skip first n lines in rdd?? >> I am not able to figure this one out? >> Thanks > >