Re: Union of 2 RDD's only returns the first one

2014-01-22 Thread Patrick Wendell
Ah somehow after all this time I've never seen that! On Wed, Jan 22, 2014 at 4:45 PM, Aureliano Buendia wrote: > > > > On Thu, Jan 23, 2014 at 12:37 AM, Patrick Wendell > wrote: >> >> What is the ++ operator here? Is this something you defined? > > > No, it's an alias for union defined in RDD.sc

Re: Union of 2 RDD's only returns the first one

2014-01-22 Thread Aureliano Buendia
On Thu, Jan 23, 2014 at 12:37 AM, Patrick Wendell wrote: > What is the ++ operator here? Is this something you defined? > No, it's an alias for union defined in RDD.scala: def ++(other: RDD[T]): RDD[T] = this.union(other) > > Another issue is that RDD's are not ordered, so when you union two >

Re: Union of 2 RDD's only returns the first one

2014-01-22 Thread Patrick Wendell
What is the ++ operator here? Is this something you defined? Another issue is that RDD's are not ordered, so when you union two together it doesn't have a well defined ordering. If you do want to do this you could coalesce into one partition, then call MapPartitions and return an iterator that fi

Union of 2 RDD's only returns the first one

2014-01-22 Thread Aureliano Buendia
Hi, I'm trying to find a way to create a csv header when using saveAsTextFile, and I came up with this: (sc.makeRDD(Array("col1,col2,col3"), 1) ++ myRdd.coalesce(1).map(_.mkString(","))) .saveAsTextFile("out.csv") But it only saves the header part. Why is that the union method does not ret