Hi Chengi,

If you just want to skip first n lines in RDD, you can do

rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: Iterator[String])
=> {
  if (partitionIdx == 0) {
    lines.drop(n)
  }
  lines
}


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com> wrote:

> Hi,
>   What is the easiest way to skip first n lines in rdd??
> I am not able to figure this one out?
> Thanks
>

Reply via email to