Re: [DISCUSS][FLINK-20726] Introduce Pulsar connector

2021-01-06 Thread Sijie Guo
Hi Till, Thank you for your email! Please find my comments inline. On Mon, Dec 28, 2020 at 5:50 AM Till Rohrmann wrote: > Hi Jianyun, > > Thanks a lot for reviving this discussion. I think it would be great to > have a well working Pulsar connector for Flink. Before diving into the > detailed

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Yun Gao
Hi Arvid, Very thanks for the deep thoughts ! > If this somehow works, we would not need to change much in the checkpoint > coordinator. He would always inject into sources. We could also ignore the > race conditions as long as the TM lives. Checkpointing times are also not > worse as with the

[jira] [Created] (FLINK-20877) Refactor BytesHashMap and BytesMultiMap to support window key

2021-01-06 Thread Jark Wu (Jira)
Jark Wu created FLINK-20877: --- Summary: Refactor BytesHashMap and BytesMultiMap to support window key Key: FLINK-20877 URL: https://issues.apache.org/jira/browse/FLINK-20877 Project: Flink Issue

[jira] [Created] (FLINK-20876) Separate the implementation of StreamExecTemporalJoin

2021-01-06 Thread Wenlong Lyu (Jira)
Wenlong Lyu created FLINK-20876: --- Summary: Separate the implementation of StreamExecTemporalJoin Key: FLINK-20876 URL: https://issues.apache.org/jira/browse/FLINK-20876 Project: Flink Issue

[DISCUSS] FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements

2021-01-06 Thread Yangze Guo
Hi, there, We would like to start a discussion thread on "FLIP-156: Runtime Interfaces for Fine-Grained Resource Requirements"[1], where we propose Slot Sharing Group (SSG) based runtime interfaces for specifying fine-grained resource requirements. In this FLIP: - Expound the user story of

[jira] [Created] (FLINK-20875) Could patch CVE-2020-17518 to version 1.10

2021-01-06 Thread Wong Mulan (Jira)
Wong Mulan created FLINK-20875: -- Summary: Could patch CVE-2020-17518 to version 1.10 Key: FLINK-20875 URL: https://issues.apache.org/jira/browse/FLINK-20875 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-20874) Python DataStreamTests.test_key_by_on_connect_stream test failed with "ArrayIndexOutOfBoundsException"

2021-01-06 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-20874: Summary: Python DataStreamTests.test_key_by_on_connect_stream test failed with "ArrayIndexOutOfBoundsException" Key: FLINK-20874 URL:

[jira] [Created] (FLINK-20873) Upgrade Calcite version to 1.27

2021-01-06 Thread Jark Wu (Jira)
Jark Wu created FLINK-20873: --- Summary: Upgrade Calcite version to 1.27 Key: FLINK-20873 URL: https://issues.apache.org/jira/browse/FLINK-20873 Project: Flink Issue Type: Improvement

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jark Wu
Thanks for updating the design doc. It looks good to me. Best, Jark On Thu, 7 Jan 2021 at 10:16, Jingsong Li wrote: > Sounds good to me. > > We don't have to worry about future changes, because it has covered all > the capabilities of calcite aggregation. > > Best, > Jingsong > > On Thu, Jan

[jira] [Created] (FLINK-20872) Job resume from history savepoint when failover if checkpoint is disabled

2021-01-06 Thread Liu (Jira)
Liu created FLINK-20872: --- Summary: Job resume from history savepoint when failover if checkpoint is disabled Key: FLINK-20872 URL: https://issues.apache.org/jira/browse/FLINK-20872 Project: Flink

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jingsong Li
Sounds good to me. We don't have to worry about future changes, because it has covered all the capabilities of calcite aggregation. Best, Jingsong On Thu, Jan 7, 2021 at 12:14 AM Sebastian Liu wrote: > Hi Jark, > > Sounds good to me. For better scalability in the future, we could add the >

[jira] [Created] (FLINK-20871) Make DataStream#executeAndCollectWithClient public

2021-01-06 Thread Steven Zhen Wu (Jira)
Steven Zhen Wu created FLINK-20871: -- Summary: Make DataStream#executeAndCollectWithClient public Key: FLINK-20871 URL: https://issues.apache.org/jira/browse/FLINK-20871 Project: Flink Issue

Re:Re: Task scheduling of Flink

2021-01-06 Thread penguin.
Hi Till, Thank you for your reply. I found such a chain of method calls: JobMaster#startScheduling -> SchedulerBase#startScheduling -> DefaultScheduler#startSchedulingInternal -> EagerSchedulingStrategy#startScheduling -> EagerSchedulingStrategy#allocateSlotsAndDeploy ->

[jira] [Created] (FLINK-20870) FlinkKafkaSink

2021-01-06 Thread xx chai (Jira)
xx chai created FLINK-20870: --- Summary: FlinkKafkaSink Key: FLINK-20870 URL: https://issues.apache.org/jira/browse/FLINK-20870 Project: Flink Issue Type: Improvement Components: API /

Re: Apache Pinot Sink

2021-01-06 Thread Venkata Sanath Muppalla
+1 As Yupeng mentioned, we at Uber are also looking into the Pinot Sink. It would be great to collaborate on this proposal. Thanks, Sanath On Wed, Jan 6, 2021 at 9:23 AM Yupeng Fu wrote: > Hi Mats, > > Glad to see this interest! We at Uber are also working on a Pinot sink (for > BATCH

Is development in FlinkML still active?

2021-01-06 Thread Badrul Chowdhury
Hi, I see that the last update to the roadmap for FlinkML was some time ago (2016). Is development still active? If so, I would like to contribute some unsupervised clustering algorithms like CLARANS. Would love some pointers. Thanks, Badrul

[jira] [Created] (FLINK-20869) Support TIMESTAMP WITH TIME ZONE

2021-01-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-20869: Summary: Support TIMESTAMP WITH TIME ZONE Key: FLINK-20869 URL: https://issues.apache.org/jira/browse/FLINK-20869 Project: Flink Issue Type: Sub-task

Re: Apache Pinot Sink

2021-01-06 Thread Yupeng Fu
Hi Mats, Glad to see this interest! We at Uber are also working on a Pinot sink (for BATCH execution), with some help from the Pinot community on abstracting Pinot interfaces for segment writes and catalog retrieval. Perhaps we can collaborate on this proposal/POC. Cheers, Yupeng On Wed, Jan

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Arvid Heise
Okay then at least you guys are in sync ;) (Although I'm also not too far away) I hope I'm not super derailing but could we reiterate why it's good to get rid of finished tasks (note: I'm also mostly in favor of that): 1. We can free all acquired resources including buffer pools, state

[jira] [Created] (FLINK-20868) Pause the idle/back pressure timers during processing mailbox actions

2021-01-06 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-20868: -- Summary: Pause the idle/back pressure timers during processing mailbox actions Key: FLINK-20868 URL: https://issues.apache.org/jira/browse/FLINK-20868 Project:

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Sebastian Liu
Hi Jark, Sounds good to me. For better scalability in the future, we could add the AggregateExpression. ``` public class AggregateExpression implements ResolvedExpression { private final FunctionDefinition functionDefinition; private final List args; private final @Nullable

[jira] [Created] (FLINK-20867) support ship files from local/remote path to k8s pod

2021-01-06 Thread Ruguo Yu (Jira)
Ruguo Yu created FLINK-20867: Summary: support ship files from local/remote path to k8s pod Key: FLINK-20867 URL: https://issues.apache.org/jira/browse/FLINK-20867 Project: Flink Issue Type: New

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Aljoscha Krettek
On 2021/01/06 16:05, Arvid Heise wrote: thanks for the detailed example. It feels like Aljoscha and you are also not fully aligned yet. For me, it sounded as if Aljoscha would like to avoid sending RPC to non-source subtasks. No, I think we need the triggering of intermediate operators. I was

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Aljoscha Krettek
On 2021/01/06 13:35, Arvid Heise wrote: I was actually not thinking about concurrent checkpoints (and actually want to get rid of them once UC is established, since they are addressing the same thing). I would give a yuge +1 to that. I don't see why we would need concurrent checkpoints in

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Arvid Heise
Hi Yun, thanks for the detailed example. It feels like Aljoscha and you are also not fully aligned yet. For me, it sounded as if Aljoscha would like to avoid sending RPC to non-source subtasks. I think we are still on the same page that we would like to trigger > checkpoint periodically until

Re: Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Yun Gao
Hi Arvid, Very thanks for the feedbacks! I'll try to answer the questions inline: > I'm also concerned about the notion of a final checkpoint. What happens > when this final checkpoint times out (checkpoint timeout > async timeout) > or fails for a different reason? I'm currently more inclined

Re: Apache Pinot Sink

2021-01-06 Thread Aljoscha Krettek
That's good to hear. I wasn't sure because the explanation focused a lot on checkpoints and the details of it while with the new Sink interface implementers don't need to be concerned with those. And in fact, when the Sink is used in BATCH execution mode there will be no checkpoints. Other

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Jark Wu
Hi Liu, Jingsong, Regarding the agg with filter, I think in theory we can support pushing such a pattern into source. We don't need to support it in the first version, but in the long term, we can support it. The designed interface should be future proof. Considering filter arg and distinct flag

[jira] [Created] (FLINK-20866) Add how to list jobs in Yarn deployment documentation when HA enabled

2021-01-06 Thread Yang Wang (Jira)
Yang Wang created FLINK-20866: - Summary: Add how to list jobs in Yarn deployment documentation when HA enabled Key: FLINK-20866 URL: https://issues.apache.org/jira/browse/FLINK-20866 Project: Flink

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Arvid Heise
I was actually not thinking about concurrent checkpoints (and actually want to get rid of them once UC is established, since they are addressing the same thing). But your explanation definitely helped me to better understand the race condition. However, I have the impression that you think

Re: [VOTE] Release 1.12.1, release candidate #1

2021-01-06 Thread Xintong Song
Thanks for driving the docker image efforts, Robert. +1 for canceling this RC. That should also give us the chance to fix FLINK-20781. I'll try to create the next release candidate as soon as the following blockers are resolved. * FLINK-20781: UnalignedCheckpointITCase failure caused by

Re: Support local aggregate push down for Blink batch planner

2021-01-06 Thread Sebastian Liu
Hi Jingsong, Jark, Thx so much for our discussion, and the cases mentioned above are really worthy for further discussion. 1. For aggregate with filter expressions: eg: select COUNT(1) FILTER(WHERE cc_call_center_sk > 3) from call_center; For the current Blink Planner, the optimized plan will

Re: [VOTE] Release 1.12.1, release candidate #1

2021-01-06 Thread Robert Metzger
The Docker images for Flink 1.12.0 are still not on Docker Hub (the official images), because they were not accepted by Docker. To avoid such issues in the future, we've decided to additionally publish the Docker images on GitHub as part of the release process (immediately) [1]. We still intend to

Re: Apache Pinot Sink

2021-01-06 Thread Poerschke, Mats
Yes, we will use the latest sink interface. Best, Mats > On 6. Jan 2021, at 11:05, Aljoscha Krettek wrote: > > It's great to see interest in this. Where you planning to use the new Sink > interface that we recently introduced? [1] > > Best, > Aljoscha > > [1] https://s.apache.org/FLIP-143 >

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Aljoscha Krettek
On 2021/01/06 11:30, Arvid Heise wrote: I'm assuming that this is the normal case. In a A->B graph, as soon as A finishes, B still has a couple of input buffers to process. If you add backpressure or longer pipelines into the mix, it's quite likely that a checkpoint may occur with B being the

Re: [DISCUSS][FLINK-20726] Introduce Pulsar connector

2021-01-06 Thread Arvid Heise
Hi Till, 1) Who from the Flink community will mentor this effort and could take > responsibility for it? > I'd be happy to mentor the transition. It remains to be seen who is doing mainly the maintenance in the long run. If all fails, I can also take that over but I was hoping that the

[jira] [Created] (FLINK-20865) Prevent potential resource deadlock in fine-grained resource management

2021-01-06 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-20865: -- Summary: Prevent potential resource deadlock in fine-grained resource management Key: FLINK-20865 URL: https://issues.apache.org/jira/browse/FLINK-20865 Project: Flink

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Arvid Heise
> > I was referring to the case where intermediate operators don't have any > active upstream (input) operators. In that case, they basically become > the "source" of that part of the graph. In your example, M1 is still > connected to a "real" source. I'm assuming that this is the normal case.

Re: Apache Pinot Sink

2021-01-06 Thread Aljoscha Krettek
It's great to see interest in this. Where you planning to use the new Sink interface that we recently introduced? [1] Best, Aljoscha [1] https://s.apache.org/FLIP-143 On 2021/01/05 12:21, Poerschke, Mats wrote: Hi all, we want to contribute a sink connector for Apache Pinot. The following

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Aljoscha Krettek
On 2021/01/05 17:27, Arvid Heise wrote: For your question: will there ever be intermediate operators that should be running that are not connected to at least once source? I think there are plenty of examples if you go beyond chained operators and fully connected exchanges. Think of any fan-in,

[jira] [Created] (FLINK-20864) Apply exact matching rules in fulfilling resource requirement with slot resource

2021-01-06 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-20864: -- Summary: Apply exact matching rules in fulfilling resource requirement with slot resource Key: FLINK-20864 URL: https://issues.apache.org/jira/browse/FLINK-20864

[jira] [Created] (FLINK-20863) Exclude network memory from ResourceProfile

2021-01-06 Thread Yangze Guo (Jira)
Yangze Guo created FLINK-20863: -- Summary: Exclude network memory from ResourceProfile Key: FLINK-20863 URL: https://issues.apache.org/jira/browse/FLINK-20863 Project: Flink Issue Type: Task

Re: Task scheduling of Flink

2021-01-06 Thread Till Rohrmann
Hi Penguin, What do you wanna do? If you want to change Flink's scheduling behaviour, then you can take a look at the implementations of SchedulerNG. Cheers, Till On Wed, Jan 6, 2021 at 6:58 AM penguin. wrote: > Hello! Do you know how to modify the task scheduling method of Flink?

[jira] [Created] (FLINK-20862) Add a converter from TypeInformation to DataType

2021-01-06 Thread Timo Walther (Jira)
Timo Walther created FLINK-20862: Summary: Add a converter from TypeInformation to DataType Key: FLINK-20862 URL: https://issues.apache.org/jira/browse/FLINK-20862 Project: Flink Issue Type:

Re: [ANNOUNCE] Weekly Community Update 2020/44-45

2021-01-06 Thread chohan
I wanted to check in again to see if a discussion has started around releasing 1.10.3. There are a few patches[0] in 1.10.3 that we are very eager to pick up. [0] https://issues.apache.org/jira/browse/FLINK-15467 and https://issues.apache.org/jira/browse/FLINK-19237 -- Sent from:

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-06 Thread Yun Gao
Hi Arvid, Very thanks for the feedbacks! > For 2) the race condition, I was more thinking of still injecting the > barrier at the source in all cases, but having some kind of short-cut to > immediately execute the RPC inside the respective taskmanager.