Yes let's move on, already cast my vote in the voting thread.
Thanks all for the patience answering / addressing my belated questions!
Best Regards,
Yu
On Sun, 27 Sep 2020 at 20:00, Stephan Ewen wrote:
> Good to see this FLIP moving.
>
> From what I understand, the remaining questions are
Good to see this FLIP moving.
>From what I understand, the remaining questions are mainly about how to
express the roles of the CheckpointStorage and specifically the behavior of
JMCheckpointStorage and FsCheckpointStorage in the docs.
This sounds like details we should discuss over the concrete
Thanks Seth, the updated FLIP overall LGTM, and I've left some inline
comments in the doc.
Best Regards,
Yu
On Fri, 25 Sep 2020 at 20:58, Seth Wiesman wrote:
> Done
>
> Seth
>
> On Fri, Sep 25, 2020 at 2:47 AM Yu Li wrote:
>
> > *bq. I think it might help to highlight specific stumbling
Done
Seth
On Fri, Sep 25, 2020 at 2:47 AM Yu Li wrote:
> *bq. I think it might help to highlight specific stumbling blocks users
> have today and why I believe this change addresses those issues.*
> Thanks for adding more details, I believe adding these blocks to the FLIP
> doc could make the
*bq. I think it might help to highlight specific stumbling blocks users
have today and why I believe this change addresses those issues.*
Thanks for adding more details, I believe adding these blocks to the FLIP
doc could make the motivation more vivid and convincing.
*bq. To be concrete I think
Hi Yu,
bq* I thought the FLIP aims at resolving some *existing* confusion, i.e.
the durability mystery to users.
I think it might help to highlight specific stumbling blocks users have
today and why I believe this change addresses those issues. Some frequent
things I've heard over the past
And to make it clear, I'm +1 on the idea of decoupling state backends with
checkpointing. I don't have any question about making it clear that
heap/RocksDB is where we serve the routine state read/write and where to
put the checkpoint data is another story. My only concern lies in the newly
*bq. What new confusion would be introduced here?*
No *new* confusion introduced, but as mentioned at the very beginning of
the motivation ("Apache Flink's durability story is a mystery to many
users"), I thought the FLIP aims at resolving some *existing*
confusions, i.e. the durability mystery to
I am confused now with the concerns here. This is very much from the user
perspective (which is partially also the developer perspective which is the
sign of an intuitive abstraction).
Of course, there will be docs describing what JMCheckpointStorage and
FsCheckpointStorage are.
And having
On 23.09.20 04:40, Yu Li wrote:
To be specific, with the old API users don't need to set checkpoint
storage, instead they only need to pass the checkpoint path w/o caring
about the storage. The new APIs are forcing users to set the storage so
they have to know the difference between different
*bq. To restate the motivation of this FLIP, the issue we are trying to
solve is that users do not understand how to choose between the different
state backends today.*
Yes, but my point is that now users may not understand how to choose
between different checkpoint storages. We cannot resolve one
Hi Yu,
To restate the motivation of this FLIP, the issue we are trying to solve is
that users do not understand how to choose between the different state
backends today. The goal is to decouple checkpoint storage from local state
storage so users can reason about these configurations separately.
It seems my questions are misunderstood to be about details on
implementations, but actually my concerns are from the users' view,
especially how to understand the new APIs and how to choose/use them.
To better express my concern, let's list the new APIs this FLIP proposes to
introduce:
To me, the simplifications made by Seth sound good and do make a lot of
sense.
We should really break this down to a few orthogonal guides, then it is
easy for users:
- Metadata always goes through the JobManager, no matter what
CheckpointStorage.
- The JobManagerCheckpointStorage has the
Hi Yu,
Let me address your comments one at a time.
I think I can address comments one and two with a single answer. This FLIP
does not change any runtime data structures or implementations. As such, it
only provides new user-facing factory classes for those components.
StateBackend (the
Thanks for the update Seth, and let me further clarify my comments /
concerns around the new `CheckpointStorage`.
1. In the existing `MemoryStateBackend`, there's a `maxStateSize` field
which limits the maximal state size sent to JM from one single memory
backend, with the default size of 5MB.
1. With `FSStateBackend`, we used to decide where to store the checkpoint
by the `state.backend.fs.memory-threshold` configuration, and we need to
decide how to align with this behavior with the new implementation.
I see this configuration available on the FileSystemStorage class. I've
added that
*bq. I agree with your assessment of the CheckpointStorage interface but I
want to push back at including those changes as a part of this FLIP.*
Makes sense, will start a separate discussion around this topic when
prepared.
*bq. One option could be to rename "CheckpointStorage" to
That makes sense to me, I've updated the FLIP and also took this chance to
make it clearer what the goals and non-goals of this proposal are.
Seth
On Thu, Sep 17, 2020 at 9:17 AM Stephan Ewen wrote:
> Just a quick note that it should be possible to rename "CheckpointStorage"
> because it is a
Just a quick note that it should be possible to rename "CheckpointStorage"
because it is a purely internal interface.
Looks like the "SnapshotStorage" takes some limited amount of functionality
from the "CheckpointStorage", like location pointer resolution.
One option could be to rename
Hi Yu,
I've updated the Deprecation / Compatibility / Migration section to more
explicitly lay out the steps that we would take as part of this FLIP. It
includes your above concerns.
Regarding SnapshotStorage vs CheckpointStorage. I'm not sure users are
going to have a problem with this. I doubt
Thanks for the suggestion and discussion, and sorry for being late to the
party.
For me, +1 for the idea, but +0 for the current FLIP document.
First of all, I suggest we explicitly mention the deprecation of existing
backends in the document. From the description, we plan to mark all
existing
Thanks for the detailed replay, +1 from my side.
Best,
Congxian
Seth Wiesman 于2020年9月17日周四 上午1:33写道:
> Hi Stephan,
>
> Regarding backward compatibility, I agree and the intention is that all
> existing code will continue to function with the same semantics. My working
> idea is to remove the
Hi Stephan,
Regarding backward compatibility, I agree and the intention is that all
existing code will continue to function with the same semantics. My working
idea is to remove the two checkpoint-storage related methods from
StateBackend into a new SnapshotStorage interface but then have
@Yun and @Congxian:
I think "async", "incremental", and similar flags belong very much with the
state backend (the index structure).
They define how the snapshotting procedure behaves.
The SnapshotStorage is really just about storage of checkpoint streams
(bytes) and handles and pointers.
Best,
Thanks for the great suggestion and the great discussion. Generally big +1
to this effort.
Some thoughts from my side:
*## Backwards Compatibility*
I think we should really strive to make this non breaking. Maybe we have
new classes / interfaces for StateBackends and CheckpointStorage and let
I think the mentioned settings should be in the state backend. They
configure how a certain backend writes to a snapshot storage, but it’s
still the backend that has the logic and decides.
I think it's a good point, though, to be conscious about those settings.
I'm sure we can figure out the
Hi Congxian,
There is an allusion to those configs in the wiki but let me better spell
out my thinking. The flink-conf configurations will not change and I
believe the java code switches should remain on the state backend objects.
We are of course not fully disentangling state backends from
Sorry for jump late in.
I like the separation here, this separation makes more user friendly now.
I just wonder how the configuration such as 'state.backend.incremental',
'state.backend.async' and
`state.backend.rocksdb.checkpoint.transfer.thred.num` will be configured
after the separation, I
Sounds good to me. I'll update the FLIP.
On Tue, Sep 15, 2020 at 8:35 AM Dawid Wysakowicz
wrote:
> There is a good number of precedents that introduced backwards
> incompatible changes to that interface (which is PublicEvolving btw). We
> introduced a couple of additional arguments to the
>
There is a good number of precedents that introduced backwards
incompatible changes to that interface (which is PublicEvolving btw). We
introduced a couple of additional arguments to the
createKeyedStateBackend method and later on removed the methods with
default implementation for backwards
Hey Dawid,
I didn't want to break compatibility but if there is precedent and everyone
is ok with it then I'm +1.
Seth
On Tue, Sep 15, 2020 at 2:22 AM Dawid Wysakowicz
wrote:
> Sorry for joining so late.
>
> Generally speaking I like this idea very much!
>
> I have one idea about the
Sorry for joining so late.
Generally speaking I like this idea very much!
I have one idea about the StateBackend interface. Could we instead of
adding a flag method boolean isLegacyStateBackend remove the
checkpointstorage related methods from StateBackend right away? The
old/legacy
I could try and come up with a longer name if you need it ... ;-)
Aljoscha
On 11.09.20 16:25, Seth Wiesman wrote:
Having thought about it more, HashMapStateBackend has won me over. I'll
update the FLIP. If there aren't any more comments I'll open it up for
voting on monday.
Seth
On Wed, Sep
Having thought about it more, HashMapStateBackend has won me over. I'll
update the FLIP. If there aren't any more comments I'll open it up for
voting on monday.
Seth
On Wed, Sep 9, 2020 at 9:09 AM Seth Wiesman wrote:
> @Yun yes, this is really about making CheckpointStorage an orthogonal
>
@Yun yes, this is really about making CheckpointStorage an orthogonal
concept. I think we can remain pragmatic and keep state-backend specific
configurations (async, incremental, etc) in the state backend themselves. I
view these as more advanced configurations and by the time someone is
changing
I like it a lot!
I think it makes sense to clean this up despite the planned new
fault-tolerance mechanisms. In the future, users will decide which
mechanism to use and I can imagine that a lot of them will keep using
the current mechanism for quite a while to come. But I'm happy to yield
to
Knauf
Sent: Wednesday, September 9, 2020 16:05
To: dev
Subject: Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing
Thanks for the initiative. Big +1. Would be interested to hear if the
proposed interfaces still make sense in the face of the new fault-tolerance
work
Thanks for the initiative. Big +1. Would be interested to hear if the
proposed interfaces still make sense in the face of the new fault-tolerance
work that is planned. Stephan/Piotr will know.
On Tue, Sep 8, 2020 at 7:05 PM Seth Wiesman wrote:
> Hi Devs,
>
> I'd like to propose an update to how
Hi Devs,
I'd like to propose an update to how state backends and checkpoint storage
are configured to help users better understand Flink.
Apache Flink's durability story is a mystery to many users. One of the most
common recurring questions from users comes from not understanding the
40 matches
Mail list logo