Re: Re: [VOTE] FLIP-134: Batch execution for the DataStream API

2020-09-18 Thread Peter Huang
+1 Looking forward to the functionality for a long time. With this feature, our internal users can backfill data with the same code and minimal change of configuration. On Fri, Sep 18, 2020 at 3:08 AM Kostas Kloudas wrote: > +1 > > My only suggestion (although by no means a blocker) would be to

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-18 Thread Steven Wu
Aljoscha, > Instead the sink would have to check for each set of committables seperately if they had already been committed. Do you think this is feasible? Yes, that is how it works in our internal implementation [1]. We don't use checkpointId. We generate a manifest file (GlobalCommT) to bundle

Re: FileSystemHaServices and BlobStore

2020-09-18 Thread Alexey Trenikhun
Hi Yang, I saw this FLIP, it is very good feature, I think overall for Kubernetes, it is preferred over “StatefulSet + PV + FileSystemHAService” approach, when it will be available we plan to use it. On other hand looks like FileSystemHAService is easier to implement, I thought about contributin

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-18 Thread Seth Wiesman
1. With `FSStateBackend`, we used to decide where to store the checkpoint by the `state.backend.fs.memory-threshold` configuration, and we need to decide how to align with this behavior with the new implementation. I see this configuration available on the FileSystemStorage class. I've added that

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-18 Thread Stephan Ewen
Having the watermark lag metric was the important part from my side. So +1 to go ahead. On Fri, Sep 18, 2020 at 4:11 PM Becket Qin wrote: > Hey Stephan, > > Thanks for the quick reply. I actually forgot to mention that I also added > a "watermarkLag" metric to the Source metrics in addition to

Timed out patterns handling using MATCH_RECOGNIZE

2020-09-18 Thread Kosma Grochowski
Hello, I would like to propose an enrichment of existing Flink SQL MATCH_RECOGNIZE syntax to cover for the case of the absence of an event. Such an enrichment would help our company solve a business case containing timed-out patterns handling. An example of usage of such a clause from Flink tra

Re: [RESULT][VOTE] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-18 Thread Yu Li
+1(belated) for the latest FLIP document. Thanks for addressing my belated questions/comments and updating the doc accordingly, Dawid! Best Regards, Yu On Thu, 17 Sep 2020 at 21:31, Yu Li wrote: > I've just added some comments in the discussion thread [1]. Thanks. > > Best Regards, > Yu > > [

Re: [DISCUSS] FLIP-142: Disentangle StateBackends from Checkpointing

2020-09-18 Thread Yu Li
*bq. I agree with your assessment of the CheckpointStorage interface but I want to push back at including those changes as a part of this FLIP.* Makes sense, will start a separate discussion around this topic when prepared. *bq. One option could be to rename "CheckpointStorage" to "CheckpointStora

[jira] [Created] (FLINK-19294) Support key and value formats in Kafka connector

2020-09-18 Thread Timo Walther (Jira)
Timo Walther created FLINK-19294: Summary: Support key and value formats in Kafka connector Key: FLINK-19294 URL: https://issues.apache.org/jira/browse/FLINK-19294 Project: Flink Issue Type:

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-18 Thread Timo Walther
Hi Jark, the fieldNames map is not intended for users. I would also be fine to make it a default scope constructor and access it with some internal utility class next to the Row class. The fieldNames map must only be used by serializers and converters. A user has no benefit in using it. For

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-18 Thread Becket Qin
Hey Stephan, Thanks for the quick reply. I actually forgot to mention that I also added a "watermarkLag" metric to the Source metrics in addition to the "currentFetchEventTimeLag" and "currentEmitEventTimeLag". So in the current proposal, both XXEventTimeLag will be based on event timestamps, whil

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-18 Thread Dawid Wysakowicz
> Ok, got your point now. I agree that it makes more sense to > make StateBackend return a contract instead of a particular > implementation. How about we name the new interface as > `CheckpointableKeyedStateBackend`? We could make > `BoundedStreamStateBackend` implement > `CheckpointableKeyedState

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-18 Thread Yu Li
*bq. The problem is that I could not use this "state backend" in a StreamOperator.* Ok, got your point now. I agree that it makes more sense to make StateBackend return a contract instead of a particular implementation. How about we name the new interface as `CheckpointableKeyedStateBackend`? We co

[jira] [Created] (FLINK-19293) RocksDB last_checkpoint.state_size grows endlessly until savepoint/restore

2020-09-18 Thread Thomas Wozniakowski (Jira)
Thomas Wozniakowski created FLINK-19293: --- Summary: RocksDB last_checkpoint.state_size grows endlessly until savepoint/restore Key: FLINK-19293 URL: https://issues.apache.org/jira/browse/FLINK-19293

[jira] [Created] (FLINK-19292) HiveCatalog should support specifying Hadoop conf dir with configuration

2020-09-18 Thread Rui Li (Jira)
Rui Li created FLINK-19292: -- Summary: HiveCatalog should support specifying Hadoop conf dir with configuration Key: FLINK-19292 URL: https://issues.apache.org/jira/browse/FLINK-19292 Project: Flink

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-18 Thread Dawid Wysakowicz
> === > /class BoundedStreamInternalStateBackend implements >         KeyedStateBackend, >         SnapshotStrategy>, >         Closeable, >         CheckpointListener {/ > ===/ > / The problem is that I could n

Re: [DISCUSS] FLIP-140: Introduce bounded style execution for keyed streams

2020-09-18 Thread Yu Li
Thanks for the clarification Dawid. Some of my thoughts: *bq. The results are times for end-to-end execution of a job. Therefore the sorting part is included. The actual target of the replacement is RocksDB, which does the serialization and key bytes comparison as well.* I see. Checking the FLIP m

Re: [VOTE] FLIP-36 - Support Interactive Programming in Flink Table API

2020-09-18 Thread Aljoscha Krettek
+1 (binding) Best, Aljoscha

Re: [DISCUSS] Deprecate and remove UnionList OperatorState

2020-09-18 Thread Aljoscha Krettek
On 12.09.20 17:20, Alexey Trenikhun wrote: We use union state to generate sequences, each operator generates offset0 + number-of-tasks - task-index + task-specific-counter * number-of-tasks (e.g. for 2 instances of operator -one instance produce even number, another odd). Last generated seque

Re: [VOTE] FLIP-36 - Support Interactive Programming in Flink Table API

2020-09-18 Thread Becket Qin
+1, thanks for driving through the FLIP, Xuannan! Cheers, Jiangjie (Becket) Qin On Thu, Sep 17, 2020 at 3:23 PM Timo Walther wrote: > +1 (binding) > > Looking forward to review the pull requests for this valuable feature. > > Regards, > Timo > > > On 16.09.20 07:35, Aljoscha Krettek wrote: > >

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-18 Thread Stephan Ewen
Hi Becket! I am wondering if it makes sense to do the following small change: - Have "currentFetchEventTimeLag" be defined on event timestamps (optionally, if the source system exposes it) <== this is like in your proposal this helps understand how long the records were in the source bef

Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Zhijiang
Congratulations! Thanks for being the release manager Zhu Zhu and everyone involved in! Best, Zhijiang -- From:Lijie Wang Send Time:2020年9月18日(星期五) 17:48 To:dev@flink.apache.org Subject:Re:[ANNOUNCE] Apache Flink 1.11.2 released

Re: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He

2020-09-18 Thread Rui Li
Congrats Godfrey! Well deserved! On Fri, Sep 18, 2020 at 5:12 PM Yun Gao wrote: > Congratulations Godfrey! > > Best, > Yun > > > > --Original Mail -- > Sender:Dawid Wysakowicz > Send Date:Thu Sep 17 14:45:55 2020 > Recipients:Flink Dev , 贺小令 > Subject:Re: [ANNO

Re: Re: [VOTE] FLIP-134: Batch execution for the DataStream API

2020-09-18 Thread Kostas Kloudas
+1 My only suggestion (although by no means a blocker) would be to remove from the FLIP the `env.setRuntimeMode()` method. I say that because this is syntactic sugar over the `env.configure()` with the `execution.runtime-mode` option set to BATCH or STREAMING. These methods can be nice but they se

Re: [DISCUSS] FLIP-33: Standardize connector metrics

2020-09-18 Thread Becket Qin
Hi folks, Thanks for all the great feedback. I have just updated FLIP-33 wiki with the following changes: 1. Renaming. "currentFetchLatency" to "currentFetchEventTimeLag", "currentLatency" to "currentEmitEventTimeLag". 2. Added the public interface code change required for the new metrics. 3. Add

Re:[ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Lijie Wang
Congratulations! Thanks @ZhuZhu for driving this release ! On 09/17/2020 13:29,Zhu Zhu wrote: The Apache Flink community is very happy to announce the release of Apache Flink 1.11.2, which is the second bugfix release for the Apache Flink 1.11 series. Apache Flink® is an open-source stream pr

Re: [DISCUSS] FLIP-143: Unified Sink API

2020-09-18 Thread Aljoscha Krettek
Steven, we were also wondering if it is a strict requirement that "later" updates to Iceberg subsume earlier updates. In the current version, you only check whether checkpoint X made it to Iceberg and then discard all committable state from Flink state for checkpoints smaller X. If we go wit

Re: Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Guowei Ma
Thanks Zhuzhu for driving the release!!! Best, Guowei On Fri, Sep 18, 2020 at 5:10 PM Yun Gao wrote: > Great! Very thanks @ZhuZhu for driving this and thanks for all contributed > to the release! > > Best, > Yun > > --Original Mail -- > *Sender:*Jingsong Li >

Re: Avoiding "too many open files" during leftOuterJoin with Flink 1.11/batch

2020-09-18 Thread Chesnay Schepler
I don't think the segment-size will help here. If I understand the code correctly, then we have a fixed number of segments (# = memory/segment size), and if all segments are full we spill _all_ current segments in memory to disk into a single file, and re-use this file for future spilling unti

Re: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He

2020-09-18 Thread Yun Gao
Congratulations Godfrey! Best, Yun --Original Mail -- Sender:Dawid Wysakowicz Send Date:Thu Sep 17 14:45:55 2020 Recipients:Flink Dev , 贺小令 Subject:Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He Congratulations! On 16/09/2020 06:19, Jark Wu wrote: > H

Re: Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Yun Gao
Great! Very thanks @ZhuZhu for driving this and thanks for all contributed to the release! Best, Yun --Original Mail -- Sender:Jingsong Li Send Date:Thu Sep 17 13:31:41 2020 Recipients:user-zh CC:dev , user , Apache Announce List Subject:Re: [ANNOUNCE] Apac

Re: [DISCUSS] FLIP-136: Improve interoperability between DataStream and Table API

2020-09-18 Thread Jark Wu
Personally I think the fieldNames Map is confusing and not handy. I just have an idea but not sure what you think. What about adding a new constructor with List field names, this enables all name-based setter/getters. Regarding to List -> Map cost for every record, we can suggest users to reuse the

[jira] [Created] (FLINK-19291) Meeting `SchemaParseException` when I use `AvroSchemaConverter` converts flink logical type

2020-09-18 Thread xiaozilong (Jira)
xiaozilong created FLINK-19291: -- Summary: Meeting `SchemaParseException` when I use `AvroSchemaConverter` converts flink logical type Key: FLINK-19291 URL: https://issues.apache.org/jira/browse/FLINK-19291

[jira] [Created] (FLINK-19290) Add documentation for Stateful Function's Flink DataStream SDK

2020-09-18 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-19290: --- Summary: Add documentation for Stateful Function's Flink DataStream SDK Key: FLINK-19290 URL: https://issues.apache.org/jira/browse/FLINK-19290 Project:

[jira] [Created] (FLINK-19289) K8s resource manager terminated pod garbage collection

2020-09-18 Thread Yi Tang (Jira)
Yi Tang created FLINK-19289: --- Summary: K8s resource manager terminated pod garbage collection Key: FLINK-19289 URL: https://issues.apache.org/jira/browse/FLINK-19289 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-19288) Make InternalTimeServiceManager an interface

2020-09-18 Thread Dawid Wysakowicz (Jira)
Dawid Wysakowicz created FLINK-19288: Summary: Make InternalTimeServiceManager an interface Key: FLINK-19288 URL: https://issues.apache.org/jira/browse/FLINK-19288 Project: Flink Issue Ty