Ok I seem to have solved the issue by enabling checkpointing. Based on the
using 1.9.0), it seemed like only StreamingFileSink.forBulkFormat()
should've required checkpointing, but based on this
experience, StreamingFileSink.forRowFormat() requires it too! Is this the
intended behavior? If so, the docs should probably be updated.


On Fri, Dec 6, 2019 at 2:01 PM Li Peng <li.p...@doordash.com> wrote:

> Hey folks, I'm trying to get StreamingFileSink to write to s3 every
> minute, with flink-s3-fs-hadoop, and based on the default rolling policy,
> which is configured to "roll" every 60 seconds, I thought that would be
> automatic (I interpreted rolling to mean actually close a multipart upload
> to s3).
> But I'm not actually seeing files written to s3 at all, instead I see a
> bunch of open multipart uploads when I check the AWS s3 console, for
> example:
>  "Uploads": [
>         {
>             "Initiated": "2019-12-06T20:57:47.000Z",
>             "Key": "2019-12-06--20/part-0-0"
>         },
>         {
>             "Initiated": "2019-12-06T20:57:47.000Z",
>             "Key": "2019-12-06--20/part-1-0"
>         },
>         {
>             "Initiated": "2019-12-06T21:03:12.000Z",
>             "Key": "2019-12-06--21/part-0-1"
>         },
>         {
>             "Initiated": "2019-12-06T21:04:15.000Z",
>             "Key": "2019-12-06--21/part-0-2"
>         },
>         {
>             "Initiated": "2019-12-06T21:22:23.000Z"
>             "Key": "2019-12-06--21/part-0-3"
>         }
> ]
> And these uploads are being open for a long time. So far after an hour,
> none of the uploads have been closed. Is this the expected behavior? If I
> wanted to get these uploads to actually write to s3 quickly, do I need to
> configure the hadoop stuff to get that done, like setting a smaller
> buffer/partition size to force it to upload
> <https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#How_S3A_writes_data_to_S3>
> ?
> Thanks,
> Li

Reply via email to