using coalesce might be dangerous, since 1 worker process will need to
handle whole file and if the file is huge you'll get OOM, however it
depends on implementation, I'm not sure how it will be done
nevertheless, worse to try the coallesce method(please post your results)

another option would be to use FileUtil.copyMerge which copies each
partition one after another into destination stream(file); so as soon as
you've written your hdfs file with spark with multiple partitions in
parallel(as usual), you can then make another step to merge it into any
destination you want

On 5 August 2015 at 07:43, Mohammed Guller <moham...@glassbeam.com> wrote:

> Just to further clarify, you can first call coalesce with argument 1 and
> then call saveAsTextFile. For example,
>
>
>
> rdd.coalesce(1).saveAsTextFile(...)
>
>
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Mohammed Guller
> *Sent:* Tuesday, August 4, 2015 9:39 PM
> *To:* 'Brandon White'; user
> *Subject:* RE: Combining Spark Files with saveAsTextFile
>
>
>
> One options is to use the coalesce method in the RDD class.
>
>
>
> Mohammed
>
>
>
> *From:* Brandon White [mailto:bwwintheho...@gmail.com
> <bwwintheho...@gmail.com>]
> *Sent:* Tuesday, August 4, 2015 7:23 PM
> *To:* user
> *Subject:* Combining Spark Files with saveAsTextFile
>
>
>
> What is the best way to make saveAsTextFile save as only a single file?
>

Reply via email to