Hey folks, I'm trying to get StreamingFileSink to write to s3 every minute,
with flink-s3-fs-hadoop, and based on the default rolling policy, which is
configured to "roll" every 60 seconds, I thought that would be automatic (I
interpreted rolling to mean actually close a multipart upload to s3).

But I'm not actually seeing files written to s3 at all, instead I see a
bunch of open multipart uploads when I check the AWS s3 console, for
example:

 "Uploads": [
        {
            "Initiated": "2019-12-06T20:57:47.000Z",
            "Key": "2019-12-06--20/part-0-0"
        },
        {
            "Initiated": "2019-12-06T20:57:47.000Z",
            "Key": "2019-12-06--20/part-1-0"
        },
        {
            "Initiated": "2019-12-06T21:03:12.000Z",
            "Key": "2019-12-06--21/part-0-1"
        },
        {
            "Initiated": "2019-12-06T21:04:15.000Z",
            "Key": "2019-12-06--21/part-0-2"
        },
        {
            "Initiated": "2019-12-06T21:22:23.000Z"
            "Key": "2019-12-06--21/part-0-3"
        }
]

And these uploads are being open for a long time. So far after an hour,
none of the uploads have been closed. Is this the expected behavior? If I
wanted to get these uploads to actually write to s3 quickly, do I need to
configure the hadoop stuff to get that done, like setting a smaller
buffer/partition size to force it to upload
<https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#How_S3A_writes_data_to_S3>
?

Thanks,
Li

Reply via email to