[DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-12 Thread Kostas Kloudas
Hi all, As described in FLIP-131 [1], we are aiming at deprecating the DataSet API in favour of the DataStream API and the Table API. After this work is done, the user will be able to write a program using the DataStream API and this will execute efficiently on both bounded and unbounded data. But

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-16 Thread Kurt Young
Hi Kostas, Thanks for starting this discussion. The first part of this FLIP: "Batch vs Streaming Scheduling" looks reasonable to me. However, there is another dimension I think we should also take into consideration, which is whether checkpointing is enabled. This option is orthogonal (but not fu

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread David Anderson
Kostas, I'm pleased to see some concrete details in this FLIP. I wonder if the current proposal goes far enough in the direction of recognizing the need some users may have for "batch" and "bounded streaming" to be treated differently. If I've understood it correctly, the section on scheduling al

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread Kostas Kloudas
Hi Kurt and David, Thanks a lot for the insightful feedback! @Kurt: For the topic of checkpointing with Batch Scheduling, I totally agree with you that it requires a lot more work and careful thinking on the semantics. This FLIP was written under the assumption that if the user wants to have chec

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread Dawid Wysakowicz
t; *CC:*dev , user > *Subject:*Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded > Input > > Hi Kurt and David, > > Thanks a lot for the insightful feedback! > > @Kurt: For the topic of checkpointing with Batch Scheduling, I 

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread Kostas Kloudas
; > Best, > Yun > > > --Original Mail ---------- > Sender:Kostas Kloudas > Send Date:Tue Aug 18 02:24:21 2020 > Recipients:David Anderson > CC:dev , user > Subject:Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input >> >

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-18 Thread David Anderson
Being able to optionally fire registered processing time timers at the end of a job would be interesting, and would help in (at least some of) the cases I have in mind. I don't have a better idea. David On Mon, Aug 17, 2020 at 8:24 PM Kostas Kloudas wrote: > Hi Kurt and David, > > Thanks a lot

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-20 Thread Kostas Kloudas
Hi all, Thanks for the comments! @Dawid: "execution.mode" can be a nice alternative and from a quick look it is not used currently by any configuration option. I will update the FLIP accordingly. @David: Given that having the option to allow timers to fire at the end of the job is already in the

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Guowei Ma
Hi, Klou Thanks for your proposal. It's a very good idea. Just a little comment about the "Batch vs Streaming Scheduling". In the AUTOMATIC execution mode maybe we could not pick BATCH execution mode even if all sources are bounded. For example some applications would use the `CheckpointListener`

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Kostas Kloudas
Hi Guowei, Thanks for the insightful comment! I agree that this can be a limitation of the current runtime, but I think that this FLIP can go on as it discusses mainly the semantics that the DataStream API will expose when applied on bounded data. There will definitely be other FLIPs that will ac

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-24 Thread Kostas Kloudas
Thanks a lot for the discussion! I will open a voting thread shortly! Kostas On Mon, Aug 24, 2020 at 9:46 AM Kostas Kloudas wrote: > > Hi Guowei, > > Thanks for the insightful comment! > > I agree that this can be a limitation of the current runtime, but I > think that this FLIP can go on as it

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-25 Thread Aljoscha Krettek
Thanks for creating this FLIP! I think the general direction is very good but I think there are some specifics that we should also put in there and that we may need to discuss here as well. ## About batch vs streaming scheduling I think we shouldn't call it "scheduling", because the decision b

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-28 Thread Dawid Wysakowicz
@Aljoscha Let me bring back to the ML some of the points we discussed offline. Ad. 1 Yes I agree it's not just about scheduling. It includes more changes to the runtime. We might need to make it more prominent in the write up. Ad. 2 You have a good point here that switching the default value for

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
Hey Aljoscha A couple of thoughts for the two remaining TODOs in the doc: # Processing Time Support in BATCH/BOUNDED execution mode I think there are two somewhat orthogonal problems around this topic:     1. Firing processing timers at the end of the job     2. Having processing timers in the B

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Aljoscha Krettek
I agree with almost all of your points! The only one where I could see that users want different behaviour BATCH jobs on the DataStream API. I agree that processing-time does not make much sense in batch jobs. However, if users have written some business logic using processing-time timers thei

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-08 Thread Dawid Wysakowicz
> The only one where I could see that users want different behaviour > BATCH jobs on the DataStream API. I agree that processing-time does > not make much sense in batch jobs. However, if users have written some > business logic using processing-time timers their jobs will silently > not work if we

Re: Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-08-17 Thread Yun Gao
FLIP-134: DataStream Semantics for Bounded Input Hi Kurt and David, Thanks a lot for the insightful feedback! @Kurt: For the topic of checkpointing with Batch Scheduling, I totally agree with you that it requires a lot more work and careful thinking on the semantics. This FLIP was written unde