Re: Read a TextFile(1 record contains 4 lines) into a RDD

Xiangrui Meng Sat, 25 Oct 2014 19:28:59 -0700

If your file is not very large, try

sc.wholeTextFiles("...").values.flatMap(_.split("\n").grouped(4).map(_.mkString("\n")))


-Xiangrui


On Sat, Oct 25, 2014 at 12:57 AM, Parthus <peng.wei....@gmail.com> wrote:
> Hi,
>
> It might be a naive question, but I still wish that somebody could help me
> handle it.
>
> I have a textFile, in which every 4 lines represent a record. Since
> SparkContext.textFile() API deems of one line as a record, it does not fit
> into my case. I know that SparkContext.hadoopFile or newAPIHadoopFile API
> can read a file in an arbitrary format, but I do not know how to use them. I
> think that there must be some API which can easily solve this problem, but I
> am kind of a bad googler and cannot find it by myself online.
>
> Would it be possible for somebody to tell me how to use the API? I run Spark
> based on Hadoop 1.2.1 rather than Hadoop 2.x. I wish that I could get
> several lines of code which actually works, if possible.
>
> Thanks very much.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Read-a-TextFile-1-record-contains-4-lines-into-a-RDD-tp17256.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Read a TextFile(1 record contains 4 lines) into a RDD

Reply via email to