Hi, Seems this just prevents writers from leaving partial data in a destination dir when jobs fail. In the previous versions of Spark, there was a way to directly write data in a destination though, Spark v2.0+ has no way to do that because of the critial issue on S3 (See: SPARK-10063).
// maropu On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <tal.grynb...@gmail.com> wrote: > I read somewhere that its because s3 has to know the size of the file > upfront > I dont really understand this, as to why is it ok not to know it for the > temp files and not ok for the final files. > The delete permission is the minor disadvantage from my side, the worst > thing is that i have a cluster of 100 machines sitting idle for 15 minutes > waiting for copy to end. > > Any suggestions how to avoid that? > > On Thu, Aug 25, 2016, 08:21 Aseem Bansal <asmbans...@gmail.com> wrote: > >> Hi >> >> When Spark saves anything to S3 it creates temporary files. Why? Asking >> this as this requires the the access credentails to be given >> delete permissions along with write permissions. >> > -- --- Takeshi Yamamuro