Hello Rafi,

Thank you for getting back. We have lifecycle rule setup for the sink and
not the s3 bucket for savepoints. This was my initial hunch too but we
tried restarting the job immediately after canceling them and it failed.

Best,
Swapnil Kumar

On Sat, Aug 17, 2019 at 2:23 PM Rafi Aroch <rafi.ar...@gmail.com> wrote:

> Hi,
>
> S3 would delete files only if you have 'lifecycle rules' [1] defined on
> the bucket. Could that be the case? If so, make sure to disable / extend
> the object expiration period.
>
> [1]
> https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
> <https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html>
>
> Thanks,
> Rafi
>
>
> On Sat, Aug 17, 2019 at 1:48 AM Oytun Tez <oy...@motaword.com> wrote:
>
>> Hi Swapnil,
>>
>> I am not familiar with the StreamingFileSink, however, this sounds like a
>> checkpointing issue to me FileSink should keep its sink state, and remove
>> from the state the files that it *really successfully* sinks (perhaps
>> you may want to add a validation here with S3 to check file integrity).
>> This leaves us in the state with the failed files, partial files etc.
>>
>>
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oy...@motaword.com — www.motaword.com
>> <http://www.motaword.com/>
>>
>>
>> On Fri, Aug 16, 2019 at 6:02 PM Swapnil Kumar <swku...@zendesk.com>
>> wrote:
>>
>>> Hello, We are using Flink to process input events and aggregate and
>>> write o/p of our streaming job to S3 using StreamingFileSink but whenever
>>> we try to restore the job from a savepoint, the restoration fails with
>>> missing part files error. As per my understanding, s3 deletes those
>>> part(intermittent) files and can no longer be found on s3. Is there a
>>> workaround for this, so that we can use s3 as a sink?
>>>
>>> --
>>> Thanks,
>>> Swapnil Kumar
>>>
>>

-- 
Thanks,
Swapnil Kumar

Reply via email to