Is/was there an option similar to DirectParquetOutputCommitter to write json files to S3 ?
On Thu, Aug 25, 2016 at 2:56 PM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > Hi, > > Seems this just prevents writers from leaving partial data in a > destination dir when jobs fail. > In the previous versions of Spark, there was a way to directly write data > in a destination though, > Spark v2.0+ has no way to do that because of the critial issue on S3 (See: > SPARK-10063). > > // maropu > > > On Thu, Aug 25, 2016 at 2:40 PM, Tal Grynbaum <tal.grynb...@gmail.com> > wrote: > >> I read somewhere that its because s3 has to know the size of the file >> upfront >> I dont really understand this, as to why is it ok not to know it for >> the temp files and not ok for the final files. >> The delete permission is the minor disadvantage from my side, the worst >> thing is that i have a cluster of 100 machines sitting idle for 15 minutes >> waiting for copy to end. >> >> Any suggestions how to avoid that? >> >> On Thu, Aug 25, 2016, 08:21 Aseem Bansal <asmbans...@gmail.com> wrote: >> >>> Hi >>> >>> When Spark saves anything to S3 it creates temporary files. Why? Asking >>> this as this requires the the access credentails to be given >>> delete permissions along with write permissions. >>> >> > > > -- > --- > Takeshi Yamamuro > -- *Tal Grynbaum* / *CTO & co-founder* m# +972-54-7875797 mobile retention done right