Hi Chengi, If you just want to skip first n lines in RDD, you can do
rddData.mapPartitionsWithIndex((partitionIdx: Int, lines: Iterator[String]) => { if (partitionIdx == 0) { lines.drop(n) } lines } Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 9:18 AM, Chengi Liu <chengi.liu...@gmail.com> wrote: > Hi, > What is the easiest way to skip first n lines in rdd?? > I am not able to figure this one out? > Thanks >