Now, after clearing state for a key, I don't want that redundant data in
the state backend. This is my concern.

Please let me know if there are any gaps.

Thanks,

On Thu, Jun 21, 2018 at 1:31 PM Garvit Sharma <[email protected]> wrote:

> I am maintaining state data for a key in ValueState. As per [0] I can
> clear() state for that key.
>
> [0]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/state.html
>
> Please let me know.
>
> Thanks,
>
>
> On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <[email protected]> wrote:
>
>> Hi Garvit,
>>
>> Let's say you clearing the state at timestamp t1, then the checkpoints
>> completed before t1 will still contains the data you cleared. But the
>> future checkpoints won't contain the cleared data again. But I'm not sure
>> what you meaning by the cleared state, you can only clear a key-value pair
>> of the state currently, you can't cleared the whole state currently.
>>
>> Best, Sihua
>>
>> On 06/21/2018 15:41,Garvit Sharma<[email protected]>
>> <[email protected]> wrote:
>>
>> So, would it delete all the files in HDFS associated with the cleared
>> state?
>>
>> On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <[email protected]> wrote:
>>
>>> Hi Garvit,
>>>
>>> > Now, let's say, we clear the state. Would the state data be removed
>>> from HDFS too?
>>>
>>> The state data would not be removed from HDFS immediately, if you clear
>>> the state in your job. But after you clearing the state in your job, the
>>> later completed checkpoint won't contain the state any more.
>>>
>>> > How does Flink manage to clear the state data from state backend on
>>> clearing the keyed state?
>>>
>>> 1. you can use the {{tate.checkpoints.num-retained}} to set the number
>>> of the completed checkpoint maintanced on HDFS.
>>> 2. If you set {{
>>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>>> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed
>>> once your job is finished(or cancled). And if you set {{
>>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.
>>>  RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained.
>>>
>>> Please refer to
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html
>>>  to
>>> find more information.
>>>
>>>
>>> Additional, I'd like to give a bref info of the checkpoint on HDFS. In a
>>> nutshell, what ever you did with the state in your running job, they only
>>> effect the content on the state backend locally. When checkpointing, flink
>>> takes a snapshot of the local state backend, and send it to the checkpoint
>>> target directory(in your case, the HDFS). The checkpoints on the HDFS looks
>>> like the periodic snapshot of the state backend of your job, they can be
>>> created or deleted but never be changed. Maybe Stefan(cc) could give you
>>> more professional information and plz correct me if I'm incorrect.
>>>
>>> Best, Sihua
>>> On 06/21/2018 14:40,Garvit Sharma<[email protected]>
>>> <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Consider a managed keyed state backed by HDFS with checkpointing
>>> enabled. Now, as the state grows the state data will be saved on HDFS.
>>>
>>> Now, let's say, we clear the state. Would the state data be removed from
>>> HDFS too?
>>>
>>> How does Flink manage to clear the state data from state backend on
>>> clearing the keyed state?
>>>
>>> --
>>>
>>> Garvit Sharma
>>> github.com/garvitlnmiit/
>>>
>>> No Body is a Scholar by birth, its only hard work and strong
>>> determination that makes him master.
>>>
>>>
>>
>> --
>>
>> Garvit Sharma
>> github.com/garvitlnmiit/
>>
>> No Body is a Scholar by birth, its only hard work and strong
>> determination that makes him master.
>>
>>
>
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>


-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.

Reply via email to