This is hard to do in general, but you can get what you are asking for by putting the following class in scope.
implicit class BetterRDD[A: scala.reflect.ClassTag](rdd: org.apache.spark.rdd.RDD[A]) { def dropOne = rdd.mapPartitionsWithIndex((i, iter) => if(i == 0 && iter.hasNext) { iter.next; iter } else iter) } On Thu, Oct 2, 2014 at 4:06 PM, Sunny Khatri <sunny.k...@gmail.com> wrote: > You can do filter with startswith ? > > On Thu, Oct 2, 2014 at 4:04 PM, SK <skrishna...@gmail.com> wrote: > >> Thanks for the help. Yes, I did not realize that the first header line >> has a >> different separator. >> >> By the way, is there a way to drop the first line that contains the >> header? >> Something along the following lines: >> >> sc.textFile(inp_file) >> .drop(1) // or tail() to drop the header line >> .map.... // rest of the processing >> >> I could not find a drop() function or take the bottom (n) elements for >> RDD. >> Alternatively, a way to create the case class schema from the header line >> of >> the file and use the rest for the data would be useful - just as a >> suggestion. Currently I am just deleting this header line manually before >> processing it in Spark. >> >> >> thanks >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p15642.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >