I think this was already answered on stackoverflow:
http://stackoverflow.com/questions/27854919/skipping-header-file-from-each-csv-file-in-spark
where the one additional idea would be:
If there were just one header line, in the first record, then the most
efficient way to filter it out is:
May be you can use wholeTextFiles method, which returns filename and content of
the file as PariRDD and ,then you can remove the first line from files.
-Original Message-
From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com]
Sent: Friday, January 09, 2015 11:48 AM
To:
Did you try something like:
val file = sc.textFile(/home/akhld/sigmoid/input)
val skipped = file.filter(row = !row.contains(header))
skipped.take(10).foreach(println)
Thanks
Best Regards
On Fri, Jan 9, 2015 at 11:48 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
Suppose I