Yes, saveAsTextFile() will give you 1 part per RDD partition. When you coalesce(1), you move everything in the RDD to a single partition, which then gives you 1 output file.
It will still be called part-00000 or something like that because that’s defined by the Hadoop API that Spark uses for reading to/writing from S3. I don’t know of a way to change that. On Wed, Apr 30, 2014 at 2:47 PM, Peter <thenephili...@yahoo.com> wrote: > Ah, looks like RDD.coalesce(1) solves one part of the problem. > On Wednesday, April 30, 2014 11:15 AM, Peter <thenephili...@yahoo.com> > wrote: > Hi > > Playing around with Spark & S3, I'm opening multiple objects (CSV files) > with: > > val hfile = sc.textFile("s3n://bucket/2014-04-28/") > > so hfile is a RDD representing 10 objects that were "underneath" > 2014-04-28. After I've sorted and otherwise transformed the content, I'm > trying to write it back to a single object: > > > sortedMap.values.map(_.mkString(",")).saveAsTextFile("s3n://bucket/concatted.csv") > > unfortunately this results in a "folder" named concatted.csv with 10 > objects underneath, part-00000 .. part-00010, corresponding to the 10 > original objects loaded. > > How can I achieve the desired behaviour of putting a single object named > concatted.csv ? > > I've tried 0.9.1 and 1.0.0-RC3. > > Thanks! > Peter > > > > >