[ https://issues.apache.org/jira/browse/FLINK-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-13027: ----------------------------------- Labels: pull-request-available (was: ) > StreamingFileSink bulk-encoded writer supports file rolling upon customized > events > ---------------------------------------------------------------------------------- > > Key: FLINK-13027 > URL: https://issues.apache.org/jira/browse/FLINK-13027 > Project: Flink > Issue Type: New Feature > Components: API / DataStream > Reporter: Ying Xu > Assignee: Ying Xu > Priority: Major > Labels: pull-request-available > > When writing in bulk-encoded format such as Parquet, StreamingFileSink only > supports OnCheckpointRollingPolicy, which rolls file at checkpointing time. > > In many scenarios, it is beneficial that the sink can roll file upon certain > events, for example, when the file size reaches a limit. Such a rolling > policy can also potentially alleviate some of the side effects of > OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file > uploading all happen at the checkpoint time. > Specifically, this Jira calls for a new rolling policy that rolls file: > # whenever a customized event happens, e.g., the file size reaches certain > limit. > # whenever a checkpoint happens. This is needed for providing exactly-once > guarantees when writing bulk-encoded files. > Users of this rolling policy need to be aware that the customized event and > the next checkpoint epoch may be close to each other, thus may yield a tiny > file per checkpoint at the worst. > -- This message was sent by Atlassian Jira (v8.3.4#803005)