using coalesce might be dangerous, since 1 worker process will need to handle whole file and if the file is huge you'll get OOM, however it depends on implementation, I'm not sure how it will be done nevertheless, worse to try the coallesce method(please post your results)
another option would be to use FileUtil.copyMerge which copies each partition one after another into destination stream(file); so as soon as you've written your hdfs file with spark with multiple partitions in parallel(as usual), you can then make another step to merge it into any destination you want On 5 August 2015 at 07:43, Mohammed Guller <moham...@glassbeam.com> wrote: > Just to further clarify, you can first call coalesce with argument 1 and > then call saveAsTextFile. For example, > > > > rdd.coalesce(1).saveAsTextFile(...) > > > > > > > > Mohammed > > > > *From:* Mohammed Guller > *Sent:* Tuesday, August 4, 2015 9:39 PM > *To:* 'Brandon White'; user > *Subject:* RE: Combining Spark Files with saveAsTextFile > > > > One options is to use the coalesce method in the RDD class. > > > > Mohammed > > > > *From:* Brandon White [mailto:bwwintheho...@gmail.com > <bwwintheho...@gmail.com>] > *Sent:* Tuesday, August 4, 2015 7:23 PM > *To:* user > *Subject:* Combining Spark Files with saveAsTextFile > > > > What is the best way to make saveAsTextFile save as only a single file? >