Re: Small-files source - partitioning based on prefix of file

Fabian Hueske Wed, 08 Aug 2018 01:22:42 -0700

Hi Averall,

As Vino said, checkpoints store the state of all operators of an
application.
The state of a monitoring source function is the position in the currently
read split and all splits that have been received and are currently pending.


In case of a recovery, the splits are recovered and the source is reset to
the split that was read when the checkpoint was taken and set to the
correct reading position.
Once, that is done, records that have been read before are read again.
However, that does not affect the exactly-once guarantees of the operators
state because all of them have been reset to the same position.

Best, Fabian

2018-08-08 9:26 GMT+02:00 vino yang <yanghua1...@gmail.com>:

> Hi Averell,
>
> You need to understand that Flink reflects the recovery of the state, not
> the recovery of the record.
> Of course, sometimes your record is state, but sometimes the intermediate
> result of your record is the state.
> It depends on your business logic and your operators.
>
> Thanks, vino.
>
> Averell <lvhu...@gmail.com> 于2018年8月8日周三 下午1:17写道：
>
>> Thank you Fabian.
>> "/In either case, some record will be read twice but if reading position
>> can
>> be reset, you can still have exactly-once state consistency because the
>> state is reset as well./"
>> I do not quite understand this statement. If I have read 30 lines from the
>> checkpoint and sent those 30 records to the next operator, then when the
>> streaming is recovered - resumed from the last checkpoint, the subsequent
>> operator would receive those 30 lines again, am I right?
>>
>> Thanks!
>>
>>
>>
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/
>>
>

Re: Small-files source - partitioning based on prefix of file

Reply via email to