Hi Sidhartha,

I don't think you should worry about this.

Currently the `StreamingFileSink` uses a long to keep this counter. The
maximum of long is 9,223,372,036,854,775,807. The counter would be reset if
count of files reaches that value. I don't think it should happen. WRT the
max filename length, for example, Linux allows 255 characters for most file
systems [1]. It's far more larger than the length of maximum length of long.

1.
https://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs

Thanks,
Biao /'bɪ.aʊ/



On Fri, Aug 2, 2019 at 12:24 AM sidhartha saurav <sidsau...@gmail.com>
wrote:

> Thank you for the clarification Habibo and Andrey.
>
> Is there any limitation after which the global counter will reset ? I mean
> do we have to worry the counter may get too long and part file crosses the
> max filename length limit set by OS or is it handled by flink.
>
> Thanks
> Sidhartha
>
> On Tue, Jul 30, 2019, 10:10 AM Andrey Zagrebin <and...@ververica.com>
> wrote:
>
>> Hi Sidhartha,
>>
>> This is a general limitation now because Flink does not keep counters for
>> all buckets but only a global one.
>> Flink assumes that the sink can write to any bucket any time and the
>> counter is not reset to not rewrite the previously written file number 0.
>>
>> Best,
>> Andrey
>>
>> On Tue, Jul 30, 2019 at 7:01 AM Haibo Sun <sunhaib...@163.com> wrote:
>>
>>> Hi Sidhartha,
>>>
>>> Currently, the part counter is never reset to 0, nor is it allowed to
>>> customize the part filename. So I don't think there's any way to reset it
>>> right now.  I guess the reason why it can't be reset to 0 is that it is
>>> concerned that the previous parts will be overwritten. Although the bucket
>>> id is part of the part file path, StreamingFileSink does not know when the
>>> bucket id will change in the case of custom BucketAssginer.
>>>
>>> Best,
>>> Haibo
>>>
>>> At 2019-07-30 06:13:54, "sidhartha saurav" <sidsau...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We are using StreamingFileSink with a custom BucketAssigner and
>>> DefaultRollingPolicy. The custom BucketAssigner is simply a date bucket
>>> assigner. The StreamingFileSink creates part files with name
>>> "part-<subtask_number>-<count_of_the_bucket_created_by_that_subtask>". The
>>> count is an integer and is incrementing on each rollover. Now my doubts
>>> are:
>>>
>>> 1. When does this count reset to 0 ?
>>> 2. Is there a way i can reset this count programmatically ? Since we are
>>> using day bucket we would like the count to reset every day.
>>>
>>> We are using Flink 1.8
>>>
>>> Thanks
>>> Sidhartha
>>>
>>>

Reply via email to