We (Databricks) use our own DirectOutputCommitter implementation, which is
a couple tens of lines of Scala code.  The class would almost entirely be a
no-op except we took some care to properly handle the _SUCCESS file.

On Fri, Feb 20, 2015 at 3:52 PM, Mingyu Kim <m...@palantir.com> wrote:

>  I didn’t get any response. It’d be really appreciated if anyone using a
> special OutputCommitter for S3 can comment on this!
>
>  Thanks,
> Mingyu
>
>   From: Mingyu Kim <m...@palantir.com>
> Date: Monday, February 16, 2015 at 1:15 AM
> To: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Which OutputCommitter to use for S3?
>
>   HI all,
>
>  The default OutputCommitter used by RDD, which is FileOutputCommitter,
> seems to require moving files at the commit step, which is not a constant
> operation in S3, as discussed in
> http://mail-archives.apache.org/mod_mbox/spark-user/201410.mbox/%3c543e33fa.2000...@entropy.be%3E
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_spark-2Duser_201410.mbox_-253C543E33FA.2000802-40entropy.be-253E&d=AwMFAg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=CQfyLCSSjJfOHcbsMrRNihcDeMtHvLkCD5_O0J786BY&s=2t0BawrpQPkJJgxklG_YX6LFzD1VaHTgDXI-w37smyc&e=>.
> People seem to develop their own NullOutputCommitter implementation or use
> DirectFileOutputCommitter (as mentioned in SPARK-3595
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D3595&d=AwMFAg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=ennQJq47pNnObsDh-88a9YUrUulcYQoV8giPASqXB84&m=CQfyLCSSjJfOHcbsMrRNihcDeMtHvLkCD5_O0J786BY&s=i-gC5iPL8kGUDicLXowgLl5ncIyDknsulTlh7o23W_g&e=>),
> but I wanted to check if there is a de facto standard, publicly available
> OutputCommitter to use for S3 in conjunction with Spark.
>
>  Thanks,
> Mingyu
>

Reply via email to