Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Timo Walther
Hi Kurt, thanks for sharing your opinion. I'm totally up for not reusing computed columns. I think Jark was a big supporter of this syntax, @Jark are you fine with this as well? The non-computed column approach was only a "slightly rejected alternative". Furthermore, we would need to think a

[jira] [Created] (FLINK-19169) Support Pandas UDAF in PyFlink (FLIP-137)

2020-09-09 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-19169: Summary: Support Pandas UDAF in PyFlink (FLIP-137) Key: FLINK-19169 URL: https://issues.apache.org/jira/browse/FLINK-19169 Project: Flink Issue Type: Improve

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-09 Thread Dawid Wysakowicz
That's for sure. I am not claiming against it. What I am saying is that we don't necessarily need a true "sorting" in this particular use case. We only need to cluster records with the same keys together. We don't need the keys to be logically sorted. What I am saying is that for clustering the key

[jira] [Created] (FLINK-19170) Parameter naming error

2020-09-09 Thread sulei (Jira)
sulei created FLINK-19170: - Summary: Parameter naming error Key: FLINK-19170 URL: https://issues.apache.org/jira/browse/FLINK-19170 Project: Flink Issue Type: Bug Reporter: sulei --

[jira] [Created] (FLINK-19171) K8s Resource Manager may lead to resource leak after pod deleted

2020-09-09 Thread Yi Tang (Jira)
Yi Tang created FLINK-19171: --- Summary: K8s Resource Manager may lead to resource leak after pod deleted Key: FLINK-19171 URL: https://issues.apache.org/jira/browse/FLINK-19171 Project: Flink Issue

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Danny Chan
“But I think the planner needs to know whether the input is insert-only or not.” Does fromDataStream(dataStream, schema, changelogMode) solve your concerns ?  People can pass around whatever ChangelogMode they like as an optional param. By default: fromDataStream(dataStream, schema), the Changel

Re: [VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Andrey Zagrebin
+1 Best, Andrey On Tue, Sep 8, 2020 at 2:16 PM Yu Li wrote: > +1 > > Best Regards, > Yu > > > On Tue, 8 Sep 2020 at 17:03, Aljoscha Krettek wrote: > > > +1 > > > > We just need to make sure to find a good name before the release but > > shouldn't block any work on this. > > > > Aljoscha > > >

Re: [VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Andrey Zagrebin
For the option name, maybe: *flink.main* or *flink.managed* (this may be a bit confusing for existing users as we said that the overall managed memory is managed by Flink) On Wed, Sep 9, 2020 at 9:56 AM Andrey Zagrebin wrote: > +1 > > Best, > Andrey > > On Tue, Sep 8, 2020 at 2:16 PM Yu Li wrot

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Konstantin Knauf
Thanks for the initiative. Big +1. Would be interested to hear if the proposed interfaces still make sense in the face of the new fault-tolerance work that is planned. Stephan/Piotr will know. On Tue, Sep 8, 2020 at 7:05 PM Seth Wiesman wrote: > Hi Devs, > > I'd like to propose an update to how

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Jark Wu
Hi everyone, I think we have a conclusion that the writable metadata shouldn't be defined as a computed column, but a normal column. "timestamp STRING SYSTEM_METADATA('timestamp')" is one of the approaches. However, it is not SQL standard compliant, we need to be cautious enough when adding new s

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Timo Walther
I had this in the inital design, but Jark had concerns at least for the `toChangelogStream(ChangelogMode)` (see earlier discussion). `fromDataStream(dataStream, schema, changelogMode)` would be possible. But in this case I would vote for a symmetric API. If we keep toChangelogStream we should

[jira] [Created] (FLINK-19172) [AbstractFileStateBackend]

2020-09-09 Thread Alessio Savi (Jira)
Alessio Savi created FLINK-19172: Summary: [AbstractFileStateBackend] Key: FLINK-19172 URL: https://issues.apache.org/jira/browse/FLINK-19172 Project: Flink Issue Type: Bug Componen

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Jark Wu
I prefer to have separate APIs for them as changelog stream requires Row type. It would make the API more straightforward and reduce the confusion. Best, Jark On Wed, 9 Sep 2020 at 16:21, Timo Walther wrote: > I had this in the inital design, but Jark had concerns at least for the > `toChangelo

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Timo Walther
Hi Jark, now we are back at the original design proposed by Dawid :D Yes, we should be cautious about adding new syntax. But the length of this discussion shows that we are looking for a good long-term solution. In this case I would rather vote for a deep integration into the syntax. Compute

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Kurt Young
I would vote for `offset INT SYSTEM_METADATA("offset")`. I don't think we can stick with the SQL standard in DDL part forever, especially as there are more and more requirements coming from different connectors and external systems. Best, Kurt On Wed, Sep 9, 2020 at 4:40 PM Timo Walther wrote

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Danny Chan
I think it would bring in much confusion by a different API name just because the DataStream generic type is different. If there are ChangelogMode that only works for Row, can we have a type check there ? Switch to a new API name does not really solve the problem well, people still need to decl

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-09 Thread Aljoscha Krettek
I think Kurts concerns/comments are very valid and we need to implement such things in the future. However, I also think that we need to get started somewhere and I think what's proposed in this FLIP is a good starting point that we can build on. So we should not get paralyzed by thinking too f

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-09 Thread Kurt Young
Yes, I didn't intend to block this FLIP, and some of the comments are actually implementation details. And all of them are handled internally, not visible to users, thus we can also change or improve them in the future. Best, Kurt On Wed, Sep 9, 2020 at 5:03 PM Aljoscha Krettek wrote: > I thin

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Danny Chan
"offset INT SYSTEM_METADATA("offset")" This is actually Oracle or MySQL style computed column syntax. "You are right that one could argue that "timestamp", "headers" are something like "key" and "value"" I have the same feeling, both key value and headers timestamp are *real* data stored in the

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Jark Wu
Hi Danny, This is not Oracle and MySQL computed column syntax, because there is no "AS" after the type. Hi everyone, If we want to use "offset INT SYSTEM_METADATA("offset")", then I think we must further discuss about "PERSISED" or "VIRTUAL" keyword for query-sink schema problem. Personally, I t

[jira] [Created] (FLINK-19173) Add Pandas Batch Group Aggregation Function Operator

2020-09-09 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-19173: Summary: Add Pandas Batch Group Aggregation Function Operator Key: FLINK-19173 URL: https://issues.apache.org/jira/browse/FLINK-19173 Project: Flink Issue Ty

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Stephan Ewen
Hi! I read through the FLIP and looks good to me. One suggestion and one question: Regarding naming, we could call the ROCKSDB/BATCH_OP category DATAPROC because this is the memory that goes into holding (and structuring) the data. I am a bit confused about the Scope enum (with values Slot and O

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Leonard Xu
Hi everyone, I’m +1 for "offset INT SYSTEM_METADATA("offset”)” if we have to make a choice. It’s not a generated column syntax and thus we can get rid of the limitation of generated column. About distinguishing the read-only metadata and writeable metadata, I prefer to add keyword after SYSTE

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Timo Walther
Hi everyone, "key" and "value" in the properties are a special case because they need to configure a format. So key and value are more than just metadata. Jark's example for setting a timestamp would work but as the FLIP discusses, we have way more metadata fields like headers, epoch-leader,

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Timo Walther
I agree with Jark. It reduces confusion. The DataStream API doesn't know changelog processing at all. A DataStream of Row can be used with both `fromDataStream` and `fromChangelogStream`. But only the latter API will interpret it as a changelog something. And as I mentioned before, the `toCh

Re: [DISCUSS] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Xintong Song
Thanks for the suggestion, @Stephan. DATAPROC makes good sense to me. +1 here Regarding the Scope, it is meant for calculating fractions from the weights. The idea is that the algorithm looks into the scopes and calculates fractions without understanding the individual use cases. I guess I shoul

[CANCEL][VOTE] FLIP-134: DataStream Semantics for Bounded Input

2020-09-09 Thread Aljoscha Krettek
I'm hereby cancelling this vote. There was more discussion on the [DISCUSS] thread for FLIP-134. Aljoscha On 24.08.20 11:33, Kostas Kloudas wrote: Hi all, After the discussion in [1], I would like to open a voting thread for FLIP-134 [2] which discusses the semantics that the DataStream API w

Re: [DISCUSS] FLIP-134: DataStream Semantics for Bounded Input

2020-09-09 Thread Aljoscha Krettek
I updated the FLIP, you can check out the changes here: https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=158871522&selectedPageVersions=16&selectedPageVersions=15 There is still the open question of what IGNORE means for getProcessingTime(). Plus, I introduced a sett

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Dawid Wysakowicz
Hi, Sorry for joining so late. First of all, I don't want to distract the discussion, but I thought maybe my opinion could help a bit, but maybe it won't ;) The first observation I got is that I think everyone agrees we need a way distinguish the read-only from r/w columns. Is that correct? Seco

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Danny Chan
“Personally, I still like the computed column design more because it allows to have full flexibility to compute the final column” I have the same feeling, the non-standard syntax "timestamp INT SYSTEM_METADATA("ts")" is neither a computed column nor normal column. It looks very likely a computed c

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-09 Thread Danny Chan
Thanks, i'm fine with that. Timo Walther 于2020年9月9日周三 下午7:02写道: > I agree with Jark. It reduces confusion. > > The DataStream API doesn't know changelog processing at all. A > DataStream of Row can be used with both `fromDataStream` and > `fromChangelogStream`. But only the latter API will inte

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Timo Walther
+1 for: timestamp INT METADATA [FROM 'my-timestamp-field'] However, I would inverse the default. Because reading is more common than writing. Regards, Timo On 09.09.20 14:25, Danny Chan wrote: “Personally, I still like the computed column design more because it allows to have full flexibil

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Leonard Xu
Thanks @Dawid for the nice summary, I think you catch all opinions of the long discussion well. @Danny “ timestamp INT METADATA [FROM 'my-timestamp-field'] [VIRTUAL] Note that the "FROM 'field name'" is only needed when the name conflict with the declared table column name, when there are n

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Yun Tang
Hi Seth Thanks for bringing this discussion, and I really like this refactor to give more cleaner concepts! When we talk about the relationship between state, state backends, and snapshots. The 'CheckpointStorage' only focus on how to persist the checkpointed state (to JM or to DFS), there sti

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Jark Wu
I'm also +1 to Danny's proposal: timestamp INT METADATA [FROM 'my-timestamp-field'] [VIRTUAL] Especially I like the shortcut: timestamp INT METADATA, this makes the most common case to be supported in the simplest way. I also think the default should be "PERSISTED", so VIRTUAL is optional when you

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Aljoscha Krettek
I like it a lot! I think it makes sense to clean this up despite the planned new fault-tolerance mechanisms. In the future, users will decide which mechanism to use and I can imagine that a lot of them will keep using the current mechanism for quite a while to come. But I'm happy to yield to

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-09 Thread Seth Wiesman
@Yun yes, this is really about making CheckpointStorage an orthogonal concept. I think we can remain pragmatic and keep state-backend specific configurations (async, incremental, etc) in the state backend themselves. I view these as more advanced configurations and by the time someone is changing t

[DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Aljoscha Krettek
Hi Devs, @Users: I'm cc'ing the user ML to see if there are any users that are relying on this feature. Please comment here if that is the case. I'd like to discuss the deprecation and eventual removal of UnionList Operator State, aka Operator State with Union Redistribution. If you don't kn

Re: [DISCUSS] Releasing Flink 1.11.2

2020-09-09 Thread Zhu Zhu
Hi All, Just an update. All known blockers are resolved and I'm starting to create RC1 for release 1.11.2. Thanks, Zhu Zhu Zhu 于2020年9月9日周三 上午11:36写道: > Thanks for reporting this issue and offering to fix it @Jingsong Li > > Agreed it is a reasonable blocker. I will postpone 1.11.2 RC1 creati

[jira] [Created] (FLINK-19174) idleTimeMsPerSecond can report incorrect values if task is blocked for more then 60 seconds

2020-09-09 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-19174: -- Summary: idleTimeMsPerSecond can report incorrect values if task is blocked for more then 60 seconds Key: FLINK-19174 URL: https://issues.apache.org/jira/browse/FLINK-19174

Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-09-09 Thread Timo Walther
"If virtual by default, when a user types "timestamp int" ==> persisted column, then adds a "metadata" after that ==> virtual column, then adds a "persisted" after that ==> persisted column." Thanks for this nice mental model explanation, Jark. This makes total sense to me. Also making the the

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Arvid Heise
+1 to getting rid of non-keyed state as is in general and for union state in particular. I had a hard time to wrap my head around the semantics of non-keyed state when designing the rescale of unaligned checkpoint. The only plausible use cases are legacy source and sinks. Both should also be rewor

[jira] [Created] (FLINK-19175) Tests in JoinITCase do not test BroadcastHashJoin

2020-09-09 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19175: Summary: Tests in JoinITCase do not test BroadcastHashJoin Key: FLINK-19175 URL: https://issues.apache.org/jira/browse/FLINK-19175 Project: Flink Iss

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-09 Thread Seth Wiesman
Generally +1 The one use case I've seen of union state I've seen in production (outside of sources and sinks) is as a "poor mans" broadcast state. This was obviously before that feature was added which is now a few years ago so I don't know if those pipelines still exist. FWIW, if they do the stat

[DISCUSS] Support source/sink parallelism config in Flink sql

2020-09-09 Thread admin
Hi devs: Currently,Flink sql does not support source/sink parallelism config.So,it will result in wasting or lacking resources in some cases. I think it is necessary to introduce configuration of source/sink parallelism in sql. From my side,i have the solution for this feature.Add parallelism con

[jira] [Created] (FLINK-19176) Support ScalaPB as a message payload serializer in Stateful Functions

2020-09-09 Thread Galen Warren (Jira)
Galen Warren created FLINK-19176: Summary: Support ScalaPB as a message payload serializer in Stateful Functions Key: FLINK-19176 URL: https://issues.apache.org/jira/browse/FLINK-19176 Project: Flink

add support for ScalaPB-based message-payload serialization to Stateful Functions?

2020-09-09 Thread Galen Warren
Hi all -- I created a ticket regarding a proposal to add a new message-payload serialization method to StateFul Functions, based on ScalaPB. This would be very similar to the existing support for protobuf serialization based on code generated for J

Re: [VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Xintong Song
Thanks everyone, I'm closing this vote now in a separate email. Concerning the naming, I will use DATAPROC, as @Stephan suggested in the discussion thread [1], for now. If there are any other opinions, feel free to reach out to me anytime before the release. Thank you~ Xintong Song [1] http:/

[RESULT][VOTE] FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Xintong Song
Hi devs, I'm happy to announce that FLIP-141[1] is officially approved. The vote [2] has been opened for more than 72h + weekends, and we have received 9 +1s, 8 of which are binding, and no vetos. Thanks everyone for participating. * Xintong (binding) * Till (binding) * Dian (binding) * Zhu (bin

Re: [VOTE] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-09-09 Thread Dian Fu
+1(binding) Regards, Dian > 在 2020年9月8日,上午7:43,jincheng sun 写道: > > +1(binding) > > Best, > Jincheng > > Xingbo Huang 于2020年9月7日周一 下午5:45写道: > >> Hi, >> >> +1 (non-binding) >> >> Best, >> Xingbo >> >> Wei Zhong 于2020年9月7日周一 下午2:37写道: >> >>> Hi all, >>> >>> I would like to start the vo

Re: [VOTE] FLIP-139: General Python User-Defined Aggregate Function on Table API

2020-09-09 Thread Hequn Cheng
+1 (binding) Best, Hequn On Thu, Sep 10, 2020 at 10:03 AM Dian Fu wrote: > +1(binding) > > Regards, > Dian > > > 在 2020年9月8日,上午7:43,jincheng sun 写道: > > > > +1(binding) > > > > Best, > > Jincheng > > > > Xingbo Huang 于2020年9月7日周一 下午5:45写道: > > > >> Hi, > >> > >> +1 (non-binding) > >> > >> Bes

[jira] [Created] (FLINK-19177) FLIP-141: Intra-Slot Managed Memory Sharing

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19177: Summary: FLIP-141: Intra-Slot Managed Memory Sharing Key: FLINK-19177 URL: https://issues.apache.org/jira/browse/FLINK-19177 Project: Flink Issue Type: Impro

[jira] [Created] (FLINK-19178) Introduce the memory weights configuration option

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19178: Summary: Introduce the memory weights configuration option Key: FLINK-19178 URL: https://issues.apache.org/jira/browse/FLINK-19178 Project: Flink Issue Type:

[jira] [Created] (FLINK-19179) Implement the new fraction calculation logic

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19179: Summary: Implement the new fraction calculation logic Key: FLINK-19179 URL: https://issues.apache.org/jira/browse/FLINK-19179 Project: Flink Issue Type: Sub-

[jira] [Created] (FLINK-19180) Make RocksDB respect the calculated fraction

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19180: Summary: Make RocksDB respect the calculated fraction Key: FLINK-19180 URL: https://issues.apache.org/jira/browse/FLINK-19180 Project: Flink Issue Type: Sub-

[jira] [Created] (FLINK-19182) Update document for intra-slot managed memory sharing

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19182: Summary: Update document for intra-slot managed memory sharing Key: FLINK-19182 URL: https://issues.apache.org/jira/browse/FLINK-19182 Project: Flink Issue T

[jira] [Created] (FLINK-19181) Make python processes respect the calculated fraction

2020-09-09 Thread Xintong Song (Jira)
Xintong Song created FLINK-19181: Summary: Make python processes respect the calculated fraction Key: FLINK-19181 URL: https://issues.apache.org/jira/browse/FLINK-19181 Project: Flink Issue T

[jira] [Created] (FLINK-19183) flink-connector-hive module compile failed with "cannot find symbol: variable TableEnvUtil"

2020-09-09 Thread Dian Fu (Jira)
Dian Fu created FLINK-19183: --- Summary: flink-connector-hive module compile failed with "cannot find symbol: variable TableEnvUtil" Key: FLINK-19183 URL: https://issues.apache.org/jira/browse/FLINK-19183 Pro

[jira] [Created] (FLINK-19184) Add Batch Physical Pandas Group Aggregate Rule and RelNode

2020-09-09 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-19184: Summary: Add Batch Physical Pandas Group Aggregate Rule and RelNode Key: FLINK-19184 URL: https://issues.apache.org/jira/browse/FLINK-19184 Project: Flink Is