Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-07-03 Thread Kostas Kloudas
Thanks Ying! Looking forward to your contribution. Kostas On Wed, Jul 3, 2019 at 6:48 PM Ying Xu wrote: > Hi Kostas: > > For simplicity FLINK-13027 > has been assigned to > my > current user ID. I will contribute using that ID. > > Will

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-07-03 Thread Ying Xu
Hi Kostas: For simplicity FLINK-13027 has been assigned to my current user ID. I will contribute using that ID. Will circulate with the community once we have initial success with this new rolling policy ! Thank you again. - Ying On Fri,

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-28 Thread Ying Xu
Hi Kostas: I'd like to. The account used to file the JIRA does not have contributor access yet . I had contributed a few Flink JIRAs in the past, using a very similar but different account. Now I would like to consolidate and use a common account for Apache projects contributions. Would you

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-28 Thread Kostas Kloudas
Hi Ying, That sounds great! Looking forward to your PR! Btw don't you want to assign the issue to yourself if you are planning to work on it? Kostas On Fri, Jun 28, 2019 at 9:54 AM Ying Xu wrote: > Thanks Kostas for confirming! > > I've filed a issue FLINK-13027 >

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-28 Thread Ying Xu
Thanks Kostas for confirming! I've filed a issue FLINK-13027 . We are actively working on the interface of such a file rolling policy, and will also perform benchmarks when it is integrated with a StreamingFileSink. We are more than happy to

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-25 Thread Kostas Kloudas
Hi Ying, You are right! If it is either on checkpoint or on size, then this is doable even with the current state of things. Could you open a JIRA so that we can keep track of the progress? Cheers, Kostas On Tue, Jun 25, 2019 at 9:49 AM Ying Xu wrote: > HI Kostas: > > Thanks for the prompt

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-25 Thread Ying Xu
HI Kostas: Thanks for the prompt reply. The file rolling policy mentioned previously is meant to roll files EITHER when a size limited is reached, OR when a checkpoint happens. Looks like every time a file is rolled, the part file is closed

Re: Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-24 Thread Kostas Kloudas
Hi Ying, Thanks for using the StreamingFileSink. The reason why the StreamingFileSink only supports OnCheckpointRollingPolicy with bulk formats has to do with the fact that currently Flink relies on the Hadoop writer for Parquet. Bulk formats keep important details about how they write the

Rolling policy when using StreamingFileSink for bulk-encoded output

2019-06-24 Thread Ying Xu
Dear Flink community: We have a use case where StreamingFileSink is used for persisting bulk-encoded data to AWS s3. In our case, the data sources consist of hybrid types of events, for which each type is