On Thu, Jan 23, 2014 at 12:37 AM, Patrick Wendell <pwend...@gmail.com>wrote:
> What is the ++ operator here? Is this something you defined? > No, it's an alias for union defined in RDD.scala: def ++(other: RDD[T]): RDD[T] = this.union(other) > > Another issue is that RDD's are not ordered, so when you union two > together it doesn't have a well defined ordering. > > If you do want to do this you could coalesce into one partition, then > call MapPartitions and return an iterator that first adds your header > and then the rest of the file, then call saveAsTextFile. Keep in mind > this will only work if you coalesce into a single partition. > Thanks! I'll give this a try. > > myRdd.coalesce(1) > .map(_.mkString(","))) > .mapPartitions(it => (Seq("col1,col2,col3") ++ it).iterator) > .saveAsTextFile("out.csv") > > - Patrick > > On Wed, Jan 22, 2014 at 11:12 AM, Aureliano Buendia > <buendia...@gmail.com> wrote: > > Hi, > > > > I'm trying to find a way to create a csv header when using > saveAsTextFile, > > and I came up with this: > > > > (sc.makeRDD(Array("col1,col2,col3"), 1) ++ > > myRdd.coalesce(1).map(_.mkString(","))) > > .saveAsTextFile("out.csv") > > > > But it only saves the header part. Why is that the union method does not > > return both RDD's? >