Streaming Files to S3

2019-11-25 Thread Li Peng
Hey folks, I'm trying to stream large volume data and write them as csv files to S3, and one of the restrictions is to try and keep the files to below 100MB (compressed) and write one file per minute. I wanted to verify with you guys regarding my understanding of StreamingFileSink: 1. From the doc

Re: Streaming Files to S3

2019-11-28 Thread Arvid Heise
Hi Li, S3 file sink will write data into prefixes, with as many part-files as the degree of parallelism. This structure comes from the good ol' Hadoop days, where an output folder also contained part-files and is independent of S3. However, each of the part-files will be uploaded in a multipart fa