Xiangrui, So, is it that full code suggestion is : val trigger = rddData.zipWithIndex().filter( _._2 >= 10L).map(_._1)
and then what DB Tsai recommended trigger.mapPartitionsWithIndex((partitionIdx: Int, lines: Iterator[String]) => { if (partitionIdx == 0) { lines.drop(n) } lines }) Is that the full operation.. What happens, if I have to drop so many records that the number exceeds partition 0.. ?? How do i handle that case? On Wed, Apr 23, 2014 at 9:51 AM, Xiangrui Meng <men...@gmail.com> wrote: > If the first partition doesn't have enough records, then it may not > drop enough lines. Try > > rddData.zipWithIndex().filter(_._2 >= 10L).map(_._1) > > It might trigger a job. > > Best, > Xiangrui > > On Wed, Apr 23, 2014 at 9:46 AM, DB Tsai <dbt...@stanford.edu> wrote: > > Hi Chengi, > > > > If you just want to skip first n lines in RDD, you can do > > > > rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: > Iterator[String]) > > => { > > if (partitionIdx == 0) { > > lines.drop(n) > > } > > lines > > } > > > > > > Sincerely, > > > > DB Tsai > > ------------------------------------------------------- > > My Blog: https://www.dbtsai.com > > LinkedIn: https://www.linkedin.com/in/dbtsai > > > > > > On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com> > wrote: > >> > >> Hi, > >> What is the easiest way to skip first n lines in rdd?? > >> I am not able to figure this one out? > >> Thanks > > > > >