[DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-08-31 Thread Dawid Wysakowicz
Hi devs, As described in the FLIP-131[1] we intend to deprecate and remove the DataSet API in the future in favour of the DataStream API for both bounded/batch and unbounded/streaming jobs. Ideally, we should be able to stay in the same performance ballpark with bounded DataStream programs as equi

[jira] [Created] (FLINK-19107) Add basic checkpoint and recovery config keys to template flink-conf.yaml

2020-08-31 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-19107: --- Summary: Add basic checkpoint and recovery config keys to template flink-conf.yaml Key: FLINK-19107 URL: https://issues.apache.org/jira/browse/FLINK-19107

Re: [DISCUSS] Introduce partitioning strategies to Table/SQL

2020-08-31 Thread Jingsong Li
Thanks Konstantin and Benchao for your response. If we need to push forward the implementation, it should be a FLIP. My original intention was to unify the partition definitions for batches and streams: - What is "PARTITION" on a table? Partitions define the physical storage form of a table. Dif

[jira] [Created] (FLINK-19106) Add more timeout options for remote function specs

2020-08-31 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-19106: --- Summary: Add more timeout options for remote function specs Key: FLINK-19106 URL: https://issues.apache.org/jira/browse/FLINK-19106 Project: Flink

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-08-31 Thread Jark Wu
Hi Timo, Thanks a lot for the great proposal and sorry for the late reply. This is an important improvement for DataStream and Table API users. I have listed my thoughts and questions below ;-) ## Conversion of DataStream to Table 1. "We limit the usage of `system_rowtime()/system_proctime` to

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-08-31 Thread Wei Zhong
Hi Timo, Thanks for your notification. I’ll remove it from the design doc. Best, Wei > 在 2020年8月31日,21:11,Timo Walther 写道: > > Hi Wei, > > is `reset_accumulator` still necessary? We dropped it recently in the Java > API because it was not used anymore by the planner. > > Regards, > Timo >

[jira] [Created] (FLINK-19105) Table API Sample Code Error

2020-08-31 Thread weizheng (Jira)
weizheng created FLINK-19105: Summary: Table API Sample Code Error Key: FLINK-19105 URL: https://issues.apache.org/jira/browse/FLINK-19105 Project: Flink Issue Type: Improvement Compone

Re: [DISCUSS] Introduce partitioning strategies to Table/SQL

2020-08-31 Thread Benchao Li
Hi Jingsong, Thanks for bringing up this discussion. I like this idea generally. I'd like to add some cases we met in our scenarios. ## Source Partition By There is an use case that users want to do some lookup thing in the UDF, it's very like the dimension table. It's common for them to cache so

Re: Next Stateful Functions Release

2020-08-31 Thread Seth Wiesman
+1 for Sept 10. Do you think we'd be able to get a fix for FLINK-18894 by then? https://issues.apache.org/jira/browse/FLINK-18894 Seth On Mon, Aug 31, 2020 at 4:02 AM Yu Li wrote: > +1 for releasing StateFun 2.2 with Sep. 10th as feature freeze date. Thanks > for driving this Igal! > > Best Re

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-08-31 Thread Timo Walther
Hi Wei, is `reset_accumulator` still necessary? We dropped it recently in the Java API because it was not used anymore by the planner. Regards, Timo On 31.08.20 15:00, Wei Zhong wrote: Hi Jincheng & Xingbo, Thanks for your suggestions. I agree that we should keep the user interface uniform

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-08-31 Thread Wei Zhong
Hi Jincheng & Xingbo, Thanks for your suggestions. I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf. Best, Wei > 在 2020年8月31日,18:06,Xingbo Huang 写道: > > Hi Wei, > > Thanks a lot for the

Re: FileSystemHaServices and BlobStore

2020-08-31 Thread Khachatryan Roman
+ dev Blob store is used for jars, serialized job, and task information and logs. You can find some information at https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture I guess in your setup, Flink was able to pick up local files. HA setup presumes that

[ANNOUNCE] Weekly Community Update 2020/35

2020-08-31 Thread Konstantin Knauf
Dear community, happy to share a brief community update for the past week with configurable memory sharing between Flink and its Python "side car", stateful Python UDFs, an introduction of our GSoD participants and a little bit more. Flink Development == * [datastream api] Dawid has

Re: [DISCUSS] Introduce partitioning strategies to Table/SQL

2020-08-31 Thread Konstantin Knauf
Hi Jingsong, I would like to understand this FLIP (?) a bit better, but I am missing some background, I believe. So, some basic questions: 1) Does the PARTITION BY clause only have an effect for sink tables defining how data should be partitioning the sink system or does it also make a difference

[jira] [Created] (FLINK-19104) how to run Fraud Detection walkthrough in Eclipse

2020-08-31 Thread David Anderson (Jira)
David Anderson created FLINK-19104: -- Summary: how to run Fraud Detection walkthrough in Eclipse Key: FLINK-19104 URL: https://issues.apache.org/jira/browse/FLINK-19104 Project: Flink Issue T

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-08-31 Thread Xingbo Huang
Hi Wei, Thanks a lot for the discussion. Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139 together. One question is whether we can use @udaf which is introduced in FLIP-137[1] to describe pandas udaf and general python udaf together. From the overall view of Python User

Re: [DISCUSS] FLIP-137: Support Pandas UDAF in PyFlink

2020-08-31 Thread Xingbo Huang
Hi Jincheng, Thanks a lot for joining the discussion and the suggestion of discussing FLIP-137 and FLIP-139 together. >> 1. We also need to consider how pandas UDAF supports metrics, and whether we need a custom interface for pandas UDAF? Yes. We need to add an interface so that users can add so

Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-31 Thread Zhu Zhu
Thanks for the clarification @Till Rohrmann >> # Implications for the scheduling Agreed that it turned out to be different execution strategies for batch jobs. We can have a simple one first and improve it later. Thanks, Zhu Xintong Song 于2020年8月31日周一 下午3:05写道: > Thanks for the clarification,

Re: [DISCUSS] Remove Kafka 0.10.x connector (and possibly 0.11.x)

2020-08-31 Thread Arvid Heise
+1 to remove these two connectors. It's a lot of baggage in comparison to the workarounds that still make it possible to use older Kafka clusters. On Fri, Aug 28, 2020 at 12:06 PM Aljoscha Krettek wrote: > Yes, that should be the process. But I'd try it in a testing environment > before doing it

[jira] [Created] (FLINK-19103) The PushPartitionIntoTableSourceScanRule will lead a performance problem when there are still many partitions after pruning

2020-08-31 Thread fa zheng (Jira)
fa zheng created FLINK-19103: Summary: The PushPartitionIntoTableSourceScanRule will lead a performance problem when there are still many partitions after pruning Key: FLINK-19103 URL: https://issues.apache.org/jira/b

[jira] [Created] (FLINK-19102) Make StateBinder a per-FunctionType entity

2020-08-31 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-19102: --- Summary: Make StateBinder a per-FunctionType entity Key: FLINK-19102 URL: https://issues.apache.org/jira/browse/FLINK-19102 Project: Flink Issu

Re: Next Stateful Functions Release

2020-08-31 Thread Yu Li
+1 for releasing StateFun 2.2 with Sep. 10th as feature freeze date. Thanks for driving this Igal! Best Regards, Yu On Mon, 31 Aug 2020 at 13:53, Tzu-Li (Gordon) Tai wrote: > +1, 10 Sept. also sounds like a reasonable feature freeze date considering > that most proposed features are already ei

[jira] [Created] (FLINK-19101) The SelectivityEstimator throw an NullPointerException when convertValueInterval with string type

2020-08-31 Thread fa zheng (Jira)
fa zheng created FLINK-19101: Summary: The SelectivityEstimator throw an NullPointerException when convertValueInterval with string type Key: FLINK-19101 URL: https://issues.apache.org/jira/browse/FLINK-19101

[jira] [Created] (FLINK-19100) Fix note about hadoop dependency from flink-avro

2020-08-31 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19100: Summary: Fix note about hadoop dependency from flink-avro Key: FLINK-19100 URL: https://issues.apache.org/jira/browse/FLINK-19100 Project: Flink Issu

Re: [VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-31 Thread Konstantin Knauf
+1 On Mon, Aug 31, 2020 at 9:16 AM Timo Walther wrote: > +1 > > Thanks for removing legacy. > > Regards, > Timo > > On 28.08.20 11:55, David Anderson wrote: > > +1 > > > > David > > > > On Fri, Aug 28, 2020 at 9:41 AM Dawid Wysakowicz > > > wrote: > > > >> Hi all, > >> > >> I would like to star

[jira] [Created] (FLINK-19099) consumer kafka message repeat

2020-08-31 Thread zouwenlong (Jira)
zouwenlong created FLINK-19099: -- Summary: consumer kafka message repeat Key: FLINK-19099 URL: https://issues.apache.org/jira/browse/FLINK-19099 Project: Flink Issue Type: Bug Component

[jira] [Created] (FLINK-19098) Make Rowdata converters public

2020-08-31 Thread Brian Zhou (Jira)
Brian Zhou created FLINK-19098: -- Summary: Make Rowdata converters public Key: FLINK-19098 URL: https://issues.apache.org/jira/browse/FLINK-19098 Project: Flink Issue Type: Improvement

Re: [VOTE] Remove deprecated DataStream#fold and DataStream#split in 1.12

2020-08-31 Thread Timo Walther
+1 Thanks for removing legacy. Regards, Timo On 28.08.20 11:55, David Anderson wrote: +1 David On Fri, Aug 28, 2020 at 9:41 AM Dawid Wysakowicz wrote: Hi all, I would like to start a vote for removing deprecated, but Public(Evolving) methods in the upcoming 1.12 release: - XxxDataSt

[jira] [Created] (FLINK-19097) Support add_jar() for Python DataStream API

2020-08-31 Thread Shuiqiang Chen (Jira)
Shuiqiang Chen created FLINK-19097: -- Summary: Support add_jar() for Python DataStream API Key: FLINK-19097 URL: https://issues.apache.org/jira/browse/FLINK-19097 Project: Flink Issue Type: I

Re: [DISCUSS] FLIP-138: Declarative Resource management

2020-08-31 Thread Xintong Song
Thanks for the clarification, @Till. - For FLIP-56, sounds good to me. I think there should be no problem before removing AllocationID. And even after replacing AllocationID, it should only require limited effort to make FLIP-56 work with SlotID. I was just trying to understand when the effort wil