Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-29 Thread Yu Li
Yes let's move on, already cast my vote in the voting thread. Thanks all for the patience answering / addressing my belated questions! Best Regards, Yu On Sun, 27 Sep 2020 at 20:00, Stephan Ewen wrote: > Good to see this FLIP moving. > > From what I understand, the remaining questions are

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-27 Thread Stephan Ewen
Good to see this FLIP moving. >From what I understand, the remaining questions are mainly about how to express the roles of the CheckpointStorage and specifically the behavior of JMCheckpointStorage and FsCheckpointStorage in the docs. This sounds like details we should discuss over the concrete

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-26 Thread Yu Li
Thanks Seth, the updated FLIP overall LGTM, and I've left some inline comments in the doc. Best Regards, Yu On Fri, 25 Sep 2020 at 20:58, Seth Wiesman wrote: > Done > > Seth > > On Fri, Sep 25, 2020 at 2:47 AM Yu Li wrote: > > > *bq. I think it might help to highlight specific stumbling

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-25 Thread Seth Wiesman
Done Seth On Fri, Sep 25, 2020 at 2:47 AM Yu Li wrote: > *bq. I think it might help to highlight specific stumbling blocks users > have today and why I believe this change addresses those issues.* > Thanks for adding more details, I believe adding these blocks to the FLIP > doc could make the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-25 Thread Yu Li
*bq. I think it might help to highlight specific stumbling blocks users have today and why I believe this change addresses those issues.* Thanks for adding more details, I believe adding these blocks to the FLIP doc could make the motivation more vivid and convincing. *bq. To be concrete I think

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Seth Wiesman
Hi Yu, bq* I thought the FLIP aims at resolving some *existing* confusion, i.e. the durability mystery to users. I think it might help to highlight specific stumbling blocks users have today and why I believe this change addresses those issues. Some frequent things I've heard over the past

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Yu Li
And to make it clear, I'm +1 on the idea of decoupling state backends with checkpointing. I don't have any question about making it clear that heap/RocksDB is where we serve the routine state read/write and where to put the checkpoint data is another story. My only concern lies in the newly

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-24 Thread Yu Li
*bq. What new confusion would be introduced here?* No *new* confusion introduced, but as mentioned at the very beginning of the motivation ("Apache Flink's durability story is a mystery to many users"), I thought the FLIP aims at resolving some *existing* confusions, i.e. the durability mystery to

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-23 Thread Stephan Ewen
I am confused now with the concerns here. This is very much from the user perspective (which is partially also the developer perspective which is the sign of an intuitive abstraction). Of course, there will be docs describing what JMCheckpointStorage and FsCheckpointStorage are. And having

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-23 Thread Aljoscha Krettek
On 23.09.20 04:40, Yu Li wrote: To be specific, with the old API users don't need to set checkpoint storage, instead they only need to pass the checkpoint path w/o caring about the storage. The new APIs are forcing users to set the storage so they have to know the difference between different

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-22 Thread Yu Li
*bq. To restate the motivation of this FLIP, the issue we are trying to solve is that users do not understand how to choose between the different state backends today.* Yes, but my point is that now users may not understand how to choose between different checkpoint storages. We cannot resolve one

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-22 Thread Seth Wiesman
Hi Yu, To restate the motivation of this FLIP, the issue we are trying to solve is that users do not understand how to choose between the different state backends today. The goal is to decouple checkpoint storage from local state storage so users can reason about these configurations separately.

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-21 Thread Yu Li
It seems my questions are misunderstood to be about details on implementations, but actually my concerns are from the users' view, especially how to understand the new APIs and how to choose/use them. To better express my concern, let's list the new APIs this FLIP proposes to introduce:

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-21 Thread Stephan Ewen
To me, the simplifications made by Seth sound good and do make a lot of sense. We should really break this down to a few orthogonal guides, then it is easy for users: - Metadata always goes through the JobManager, no matter what CheckpointStorage. - The JobManagerCheckpointStorage has the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-21 Thread Seth Wiesman
Hi Yu, Let me address your comments one at a time. I think I can address comments one and two with a single answer. This FLIP does not change any runtime data structures or implementations. As such, it only provides new user-facing factory classes for those components. StateBackend (the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-21 Thread Yu Li
Thanks for the update Seth, and let me further clarify my comments / concerns around the new `CheckpointStorage`. 1. In the existing `MemoryStateBackend`, there's a `maxStateSize` field which limits the maximal state size sent to JM from one single memory backend, with the default size of 5MB.

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-18 Thread Seth Wiesman
1. With `FSStateBackend`, we used to decide where to store the checkpoint by the `state.backend.fs.memory-threshold` configuration, and we need to decide how to align with this behavior with the new implementation. I see this configuration available on the FileSystemStorage class. I've added that

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-18 Thread Yu Li
*bq. I agree with your assessment of the CheckpointStorage interface but I want to push back at including those changes as a part of this FLIP.* Makes sense, will start a separate discussion around this topic when prepared. *bq. One option could be to rename "CheckpointStorage" to

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-17 Thread Seth Wiesman
That makes sense to me, I've updated the FLIP and also took this chance to make it clearer what the goals and non-goals of this proposal are. Seth On Thu, Sep 17, 2020 at 9:17 AM Stephan Ewen wrote: > Just a quick note that it should be possible to rename "CheckpointStorage" > because it is a

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-17 Thread Stephan Ewen
Just a quick note that it should be possible to rename "CheckpointStorage" because it is a purely internal interface. Looks like the "SnapshotStorage" takes some limited amount of functionality from the "CheckpointStorage", like location pointer resolution. One option could be to rename

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-17 Thread Seth Wiesman
Hi Yu, I've updated the Deprecation / Compatibility / Migration section to more explicitly lay out the steps that we would take as part of this FLIP. It includes your above concerns. Regarding SnapshotStorage vs CheckpointStorage. I'm not sure users are going to have a problem with this. I doubt

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-17 Thread Yu Li
Thanks for the suggestion and discussion, and sorry for being late to the party. For me, +1 for the idea, but +0 for the current FLIP document. First of all, I suggest we explicitly mention the deprecation of existing backends in the document. From the description, we plan to mark all existing

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Congxian Qiu
Thanks for the detailed replay, +1 from my side. Best, Congxian Seth Wiesman 于2020年9月17日周四 上午1:33写道: > Hi Stephan, > > Regarding backward compatibility, I agree and the intention is that all > existing code will continue to function with the same semantics. My working > idea is to remove the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Seth Wiesman
Hi Stephan, Regarding backward compatibility, I agree and the intention is that all existing code will continue to function with the same semantics. My working idea is to remove the two checkpoint-storage related methods from StateBackend into a new SnapshotStorage interface but then have

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Stephan Ewen
@Yun and @Congxian: I think "async", "incremental", and similar flags belong very much with the state backend (the index structure). They define how the snapshotting procedure behaves. The SnapshotStorage is really just about storage of checkpoint streams (bytes) and handles and pointers. Best,

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Stephan Ewen
Thanks for the great suggestion and the great discussion. Generally big +1 to this effort. Some thoughts from my side: *## Backwards Compatibility* I think we should really strive to make this non breaking. Maybe we have new classes / interfaces for StateBackends and CheckpointStorage and let

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Aljoscha Krettek
I think the mentioned settings should be in the state backend. They configure how a certain backend writes to a snapshot storage, but it’s still the backend that has the logic and decides. I think it's a good point, though, to be conscious about those settings. I'm sure we can figure out the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-16 Thread Seth Wiesman
Hi Congxian, There is an allusion to those configs in the wiki but let me better spell out my thinking. The flink-conf configurations will not change and I believe the java code switches should remain on the state backend objects. We are of course not fully disentangling state backends from

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-15 Thread Congxian Qiu
Sorry for jump late in. I like the separation here, this separation makes more user friendly now. I just wonder how the configuration such as 'state.backend.incremental', 'state.backend.async' and `state.backend.rocksdb.checkpoint.transfer.thred.num` will be configured after the separation, I

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-15 Thread Seth Wiesman
Sounds good to me. I'll update the FLIP. On Tue, Sep 15, 2020 at 8:35 AM Dawid Wysakowicz wrote: > There is a good number of precedents that introduced backwards > incompatible changes to that interface (which is PublicEvolving btw). We > introduced a couple of additional arguments to the >

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-15 Thread Dawid Wysakowicz
There is a good number of precedents that introduced backwards incompatible changes to that interface (which is PublicEvolving btw). We introduced a couple of additional arguments to the createKeyedStateBackend method and later on removed the methods with default implementation for backwards

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-15 Thread Seth Wiesman
Hey Dawid, I didn't want to break compatibility but if there is precedent and everyone is ok with it then I'm +1. Seth On Tue, Sep 15, 2020 at 2:22 AM Dawid Wysakowicz wrote: > Sorry for joining so late. > > Generally speaking I like this idea very much! > > I have one idea about the

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-15 Thread Dawid Wysakowicz
Sorry for joining so late. Generally speaking I like this idea very much! I have one idea about the StateBackend interface. Could we instead of adding a flag method boolean isLegacyStateBackend remove the checkpointstorage related methods from StateBackend right away? The old/legacy

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-11 Thread Aljoscha Krettek
I could try and come up with a longer name if you need it ... ;-) Aljoscha On 11.09.20 16:25, Seth Wiesman wrote: Having thought about it more, HashMapStateBackend has won me over. I'll update the FLIP. If there aren't any more comments I'll open it up for voting on monday. Seth On Wed, Sep

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-11 Thread Seth Wiesman
Having thought about it more, HashMapStateBackend has won me over. I'll update the FLIP. If there aren't any more comments I'll open it up for voting on monday. Seth On Wed, Sep 9, 2020 at 9:09 AM Seth Wiesman wrote: > @Yun yes, this is really about making CheckpointStorage an orthogonal >

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Seth Wiesman
@Yun yes, this is really about making CheckpointStorage an orthogonal concept. I think we can remain pragmatic and keep state-backend specific configurations (async, incremental, etc) in the state backend themselves. I view these as more advanced configurations and by the time someone is changing

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Aljoscha Krettek
I like it a lot! I think it makes sense to clean this up despite the planned new fault-tolerance mechanisms. In the future, users will decide which mechanism to use and I can imagine that a lot of them will keep using the current mechanism for quite a while to come. But I'm happy to yield to

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Yun Tang
Knauf Sent: Wednesday, September 9, 2020 16:05 To: dev Subject: Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing Thanks for the initiative. Big +1. Would be interested to hear if the proposed interfaces still make sense in the face of the new fault-tolerance work

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Konstantin Knauf
Thanks for the initiative. Big +1. Would be interested to hear if the proposed interfaces still make sense in the face of the new fault-tolerance work that is planned. Stephan/Piotr will know. On Tue, Sep 8, 2020 at 7:05 PM Seth Wiesman wrote: > Hi Devs, > > I'd like to propose an update to how

[DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-08 Thread Seth Wiesman
Hi Devs, I'd like to propose an update to how state backends and checkpoint storage are configured to help users better understand Flink. Apache Flink's durability story is a mystery to many users. One of the most common recurring questions from users comes from not understanding the