If your file is not very large, try sc.wholeTextFiles("...").values.flatMap(_.split("\n").grouped(4).map(_.mkString("\n")))
-Xiangrui On Sat, Oct 25, 2014 at 12:57 AM, Parthus <peng.wei....@gmail.com> wrote: > Hi, > > It might be a naive question, but I still wish that somebody could help me > handle it. > > I have a textFile, in which every 4 lines represent a record. Since > SparkContext.textFile() API deems of one line as a record, it does not fit > into my case. I know that SparkContext.hadoopFile or newAPIHadoopFile API > can read a file in an arbitrary format, but I do not know how to use them. I > think that there must be some API which can easily solve this problem, but I > am kind of a bad googler and cannot find it by myself online. > > Would it be possible for somebody to tell me how to use the API? I run Spark > based on Hadoop 1.2.1 rather than Hadoop 2.x. I wish that I could get > several lines of code which actually works, if possible. > > Thanks very much. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Read-a-TextFile-1-record-contains-4-lines-into-a-RDD-tp17256.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org