Now, after clearing state for a key, I don't want that redundant data in the state backend. This is my concern.
Please let me know if there are any gaps. Thanks, On Thu, Jun 21, 2018 at 1:31 PM Garvit Sharma <[email protected]> wrote: > I am maintaining state data for a key in ValueState. As per [0] I can > clear() state for that key. > > [0] > https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/stream/state/state.html > > Please let me know. > > Thanks, > > > On Thu, Jun 21, 2018 at 1:19 PM sihua zhou <[email protected]> wrote: > >> Hi Garvit, >> >> Let's say you clearing the state at timestamp t1, then the checkpoints >> completed before t1 will still contains the data you cleared. But the >> future checkpoints won't contain the cleared data again. But I'm not sure >> what you meaning by the cleared state, you can only clear a key-value pair >> of the state currently, you can't cleared the whole state currently. >> >> Best, Sihua >> >> On 06/21/2018 15:41,Garvit Sharma<[email protected]> >> <[email protected]> wrote: >> >> So, would it delete all the files in HDFS associated with the cleared >> state? >> >> On Thu, Jun 21, 2018 at 12:58 PM sihua zhou <[email protected]> wrote: >> >>> Hi Garvit, >>> >>> > Now, let's say, we clear the state. Would the state data be removed >>> from HDFS too? >>> >>> The state data would not be removed from HDFS immediately, if you clear >>> the state in your job. But after you clearing the state in your job, the >>> later completed checkpoint won't contain the state any more. >>> >>> > How does Flink manage to clear the state data from state backend on >>> clearing the keyed state? >>> >>> 1. you can use the {{tate.checkpoints.num-retained}} to set the number >>> of the completed checkpoint maintanced on HDFS. >>> 2. If you set {{ >>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. >>> DELETE_ON_CANCELLATION)}} then the checkpoints on HDFS will be removed >>> once your job is finished(or cancled). And if you set {{ >>> env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup. >>> RETAIN_ON_CANCELLATION)}} then the checkpoints will be remained. >>> >>> Please refer to >>> https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/state/checkpoints.html >>> to >>> find more information. >>> >>> >>> Additional, I'd like to give a bref info of the checkpoint on HDFS. In a >>> nutshell, what ever you did with the state in your running job, they only >>> effect the content on the state backend locally. When checkpointing, flink >>> takes a snapshot of the local state backend, and send it to the checkpoint >>> target directory(in your case, the HDFS). The checkpoints on the HDFS looks >>> like the periodic snapshot of the state backend of your job, they can be >>> created or deleted but never be changed. Maybe Stefan(cc) could give you >>> more professional information and plz correct me if I'm incorrect. >>> >>> Best, Sihua >>> On 06/21/2018 14:40,Garvit Sharma<[email protected]> >>> <[email protected]> wrote: >>> >>> Hi, >>> >>> Consider a managed keyed state backed by HDFS with checkpointing >>> enabled. Now, as the state grows the state data will be saved on HDFS. >>> >>> Now, let's say, we clear the state. Would the state data be removed from >>> HDFS too? >>> >>> How does Flink manage to clear the state data from state backend on >>> clearing the keyed state? >>> >>> -- >>> >>> Garvit Sharma >>> github.com/garvitlnmiit/ >>> >>> No Body is a Scholar by birth, its only hard work and strong >>> determination that makes him master. >>> >>> >> >> -- >> >> Garvit Sharma >> github.com/garvitlnmiit/ >> >> No Body is a Scholar by birth, its only hard work and strong >> determination that makes him master. >> >> > > -- > > Garvit Sharma > github.com/garvitlnmiit/ > > No Body is a Scholar by birth, its only hard work and strong determination > that makes him master. > -- Garvit Sharma github.com/garvitlnmiit/ No Body is a Scholar by birth, its only hard work and strong determination that makes him master.
