Also, zipWithIndex() is not valid.. Did you meant zipParititions?
On Wed, Apr 23, 2014 at 9:55 AM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Xiangrui, > So, is it that full code suggestion is : > val trigger = rddData.zipWithIndex().filter( > _._2 >= 10L).map(_._1) > > and then what DB Tsai recommended > trigger.mapPartitionsWithIndex((partitionIdx: Int, lines: > Iterator[String]) => { > if (partitionIdx == 0) { > lines.drop(n) > } > lines > }) > > Is that the full operation.. > > What happens, if I have to drop so many records that the number exceeds > partition 0.. ?? > How do i handle that case? > > > > > On Wed, Apr 23, 2014 at 9:51 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> If the first partition doesn't have enough records, then it may not >> drop enough lines. Try >> >> rddData.zipWithIndex().filter(_._2 >= 10L).map(_._1) >> >> It might trigger a job. >> >> Best, >> Xiangrui >> >> On Wed, Apr 23, 2014 at 9:46 AM, DB Tsai <dbt...@stanford.edu> wrote: >> > Hi Chengi, >> > >> > If you just want to skip first n lines in RDD, you can do >> > >> > rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: >> Iterator[String]) >> > => { >> > if (partitionIdx == 0) { >> > lines.drop(n) >> > } >> > lines >> > } >> > >> > >> > Sincerely, >> > >> > DB Tsai >> > ------------------------------------------------------- >> > My Blog: https://www.dbtsai.com >> > LinkedIn: https://www.linkedin.com/in/dbtsai >> > >> > >> > On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com> >> wrote: >> >> >> >> Hi, >> >> What is the easiest way to skip first n lines in rdd?? >> >> I am not able to figure this one out? >> >> Thanks >> > >> > >> > >