Hi everyone,
@Aljoscha, I have updated the Per-Job mode Section of the FLIP.
It seems that people involved in the discussion have reach a consensus. If
there are no more comments, I would like to start the voting thread tomorrow.
Best,
Xuannan
On Sep 15, 2020, 6:18 PM +0800, Aljoscha Krettek ,
On 15.09.20 10:54, Xuannan Su wrote:
One way of solving this is to let the CatalogManager probe the existence of the
IntermediateResult so that the planner can decide if the cache table should be
used.
That could be a reasonable solution, yes.
Best,
Aljoscha
Hi Aljoscha,
I thought about relying on the failover mechanism to re-execute the whole graph
when the cache doesn’t exist. The only concern I have is that every job that
uses the cache table in the per-job cluster will have to go through the
following process,
job submit -> job fail because of
On 15.09.20 07:00, Xuannan Su wrote:
Thanks for your comment. I agree that we should not introduce tight coupling
with PipelineExecutor to the execution environment. With that in mind, to
distinguish the per-job and session mode, we can introduce a new method, naming
isPerJobModeExecutor, in
Hi Aljoscha,
Thanks for your comment. I agree that we should not introduce tight coupling
with PipelineExecutor to the execution environment. With that in mind, to
distinguish the per-job and session mode, we can introduce a new method, naming
isPerJobModeExecutor, in the
On 10.09.20 09:00, Xuannan Su wrote:
How do you imagine that? Where do you distinguish between per-job and
session mode?
The StreamExecutionEnvironment can distinguish between per-job and session mode
by the type of the PipelineExecutor, i.e, AbstractJobClusterExecutor vs
Hi Xuannan,
Thanks for updating the FLIP and answering my questions.
The FLIP is good to go from my side. What do others think?
Regards,
Timo
On 10.09.20 09:00, Xuannan Su wrote:
Hi Timo,
Thanks for pointing out the mistake, I have made the update accordingly.
And I added more information
Hi Timo,
Thanks for pointing out the mistake, I have made the update accordingly.
And I added more information to the FLIP to address the problem you raised in
the email.
To sum it up:
> 1. How does `ClusterPartitionDescriptor` look like?
The ClusterPartitionDescriptor will include the
Hi Xuannan,
thanks for the update. Here is some final feedback from my side. Maybe
others have some final feedback as well before we continue to a voting?
Some mistakes that we should fix in the FLIP:
- The FLIP declares `Table cache();` but I guess this should be
`CachedTable cache();` now.
Hi Timo,
Thanks for your comments. After the offline discussion, I have updated the FLIP
with the following change.
1. Update the end to end process
a. The Table.cache method should only wrap the origin query operation with
CacheOperation.
b. The planner will add the CacheSink or
Hi Xuannan,
sorry for joining the discussion so late. I agree that this is a very
nice and useful feature. However, the impact it has to many components
in the stack requires more discussion in my opinion.
1) Separation of concerns:
The current design seems to mix different layers. We should
Hi folks,
It seems that all the raised concerns so far have been resolved. I plan to
start a voting thread for FLIP-36 early next week if there are no comments.
Thanks,
Xuannan
On Jul 28, 2020, 7:42 PM +0800, Xuannan Su , wrote:
> Hi Kurt,
>
> Thanks for the comments.
>
> You are right that the
Hi Kurt,
Thanks for the comments.
You are right that the FLIP lacks a proper discussion about the impact of the
optimizer. I have added the section to talk about how the cache table works
with the optimizer. I hope this could resolve your concern. Please let me know
if you have any further
Thanks for the reply, I have one more comment about the optimizer
affection. Even if you are
trying to make the cached table be as orthogonal to the optimizer as
possible by introducing
a special sink, it is still not clear why this approach is safe. Maybe you
can add some process
introduction
Hi Kurt,
Thanks for the comments.
1. How do you identify the CachedTable?
For the current design proposed in FLIP-36, we are using the first approach you
mentioned, where the key of the map is the Cached Table java object. I think it
is fine not to be able to identify another table
Hi Xuanna,
Thanks for the detailed design doc, it described clearly how the API looks
and how to interact with Flink runtime.
However, the part which relates to SQL's optimizer is kind of blurry. To be
more precise, I have following questions:
1. How do you identify the CachedTable? I can
Hi folks,
I'd like to revive the discussion about FLIP-36 Support Interactive Programming
in Flink Table API
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
The FLIP proposes to add support for interactive programming in Flink Table
API.
Hi folks,
As the feature freeze of Flink 1.11 has passed and the release branch is
cut, I'd like to revive this discussion thread of FLIP-36[1]. A quick
summary of FLIP-36:
The FLIP proposes to add support for interactive programming in Flink Table
API. Specifically, it let users cache the
Hi,
There are some feedbacks from @Timo and @Kurt in the voting thread for
FLIP-36 and I want to share my thoughts here.
1. How would the FLIP-36 look like after FLIP-84?
I don't think FLIP-84 will affect FLIP-36 from the public API perspective.
Users can call .cache on a table object and the
Hi folks,
The FLIP-36 is updated according to the discussion with Becket. In the
meantime, any comments are very welcome.
If there are no further comments, I would like to start the voting
thread by tomorrow.
Thanks,
Xuannan
On Sun, Apr 26, 2020 at 9:34 AM Xuannan Su wrote:
> Hi Becket,
>
>
Hi Becket,
You are right. It makes sense to treat retry of job 2 as an ordinary job.
And the config does introduce some unnecessary confusion. Thank you for you
comment. I will update the FLIP.
Best,
Xuannan
On Sat, Apr 25, 2020 at 7:44 AM Becket Qin wrote:
> Hi Xuannan,
>
> If user submits
Hi Xuannan,
If user submits Job 1 and generated a cached intermediate result. And later
on, user submitted job 2 which should ideally use the intermediate result.
In that case, if job 2 failed due to missing the intermediate result, Job 2
should be retried with its full DAG. After that when Job 2
Hi Becket,
The intermediate result will indeed be automatically re-generated by
resubmitting the original DAG. And that job could fail as well. In that
case, we need to decide if we should resubmit the original DAG to
re-generate the intermediate result or give up and throw an exception to
the
Hi Xuannan,
I am not entirely sure if I understand the cases you mentioned. The users
> can use the cached table object returned by the .cache() method in other
> job and it should read the intermediate result. The intermediate result can
> gone in the following three cases: 1. the user
Hi Becket,
Thanks for the comments.
On Fri, Apr 24, 2020 at 9:12 AM Becket Qin wrote:
> Hi Xuannan,
>
> Thanks for picking up the FLIP. It looks good to me overall. Some quick
> comments / questions below:
>
> 1. Do we also need changes in the Java API?
>
Yes, the public interface of Table
Hi Xuannan,
Thanks for picking up the FLIP. It looks good to me overall. Some quick
comments / questions below:
1. Do we also need changes in the Java API?
2. What are the cases that users may want to retry reading the intermediate
result? It seems that once the intermediate result has gone, it
Hi folks,
I'd like to start the discussion about FLIP-36 Support Interactive
Programming in Flink Table API
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
The FLIP proposes to add support for interactive programming in Flink Table
API.
The FLIP looks good and is quite details, thanks!
I think we should proceed to start to vote whether to accept this FLIP.
If the feature and design are accepted, the next step would be to have an
implementation breakdown.
Best,
Stephan
On Mon, May 6, 2019 at 4:18 AM Becket Qin wrote:
> Hi
Hi folks,
Just want to revive this discussion thread. A few of us had some offline
discussions around the implementation details of this FLIP.
Here I briefly summarize the offline discussion:
--
Some concerns were raised to the default implementation of cache service.
1. The default cache
Thanks Piotr, for the +1 and all the patient discussion :)
On Wed, Mar 13, 2019 at 3:53 PM Piotr Nowojski wrote:
> Hi Becket,
>
> Thank you for driving the effort and writing down the detailed proposal.
> To me this FLIP looks good and it has +1 from me.
>
> Piotr Nowojski
>
> > On 12 Mar 2019,
Hi Becket,
Thank you for driving the effort and writing down the detailed proposal. To me
this FLIP looks good and it has +1 from me.
Piotr Nowojski
> On 12 Mar 2019, at 13:21, Becket Qin wrote:
>
> Hi folks,
>
> We would like to start the discussion thread about FLIP-36 support
>
Hi folks,
We would like to start the discussion thread about FLIP-36 support
interactive programming in Flink Table API.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
There has been an extended discussion[1] in the mailing list. To quick
32 matches
Mail list logo