Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
It seems that we'd have to use invite links on the Flink website for people to join our Slack (1) These links can be configured to have no time-expiration, but they will expire after 100 guests have joined. I guess we'd have to use a URL shortener (https://s.apache.org) that we update once the invi

Re: RichAsyncFunction + Cache or Map State?

2022-05-09 Thread Dan Hill
Hi. Any advice on this? I just hit this too. Some ideas: 1. Manage our own separate cache (disk, Redis, etc). 2. Use two operators (first one a cache one and the second is the RichAsyncFunction). Have a feedback loop by using another Kafka topic or S3 File source/sink. On Wed, Feb 9, 2022 at

How to get flink to use POJO serializer when enum is present in POJO class

2022-05-09 Thread Tejas B
Hi, I am trying to get flink schema evolution to work for me using POJO serializer. But I found out that if an enum is present in the POJO then the POJO serializer is not used. Example of my POJO is as follows : public class Rule { String id;int val; RuleType ruleType;//Newly added field//int val2

Re: What causes a task to change parallelism?

2022-05-09 Thread Caizhi Weng
Hi! I can't see the image (if there is any) in the email. But from the description it is related to the arrow labeled GLOBAL. A global shuffle collects all records from its upstream and aggregate them in its downstream. There are several SQL patterns which lead to this type of shuffle, for exampl

Re: OOM errors cause by the new KafkaSink API

2022-05-09 Thread Hua Wei Chen
Hi Martijn, Thanks for your response. > What's the Flink version that you're using? Our Flink version is 1.14.4 and the scala version is 2.12.12. > Could you also separate the two steps (switching from the old Kafka interfaces to the new ones + modifying serializers) to determine which of the tw

Re: FW: Rabbitmq Connection error with Flink version(1.15.0)

2022-05-09 Thread Dian Fu
Hi Harshit, You should use https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-rabbitmq/1.15.0/flink-sql-connector-rabbitmq-1.15.0.jar which is a fat jar containing all the dependencies. Regards, Dian On Mon, May 9, 2022 at 10:05 PM harshit.varsh...@iktara.ai < harshit.vars

Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Georg Heiler
Hi Martijn, many thanks for this clarification. Do you know of any example somewhere which would showcase such an approach? Best, Georg Am Mo., 9. Mai 2022 um 14:45 Uhr schrieb Martijn Visser < martijnvis...@apache.org>: > Hi Georg, > > No they wouldn't. There is no capability out of the box th

What causes a task to change parallelism?

2022-05-09 Thread Jason Politis
Good evening all, We are running a job in flink SQL. We've confirmed all Kafka topics that we are sourcing from have 5 partitions. All source tasks in the larger DAG, of which we're only showing a small portion of it below, have a parallelism of 5. But for some reason, this one little guy here

Re: Flink-SQL returning duplicate rows for some records

2022-05-09 Thread Joost Molenaar
Hi Leonard and Martijn, thanks for looking into this. I ran into the issue on Flink 1.14.4 (with the matching flink-sql-connector-kafka based on Scala 2.11), but reproduced the problem today in 1.15.0 (again with the matching flink-sql-connector-kafka). I haven't used older versions than 1.14.4.

FW: Rabbitmq Connection error with Flink version(1.15.0)

2022-05-09 Thread harshit.varsh...@iktara.ai
From: harshit.varsh...@iktara.ai [mailto:harshit.varsh...@iktara.ai] Sent: Monday, May 9, 2022 7:33 PM To: 'user@flink.apache.org' Cc: 'harshit.varsh...@iktara.ai' Subject: Rabbitmq Connection error with Flink version(1.15.0) Dear Team, I am new to pyflink and request for your suppor

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Shilpa Shankar
Thanks Andrew. We did consider this solution too. Unfortunately we do not have permissions to generate artificial kafka events in our ecosystem. Dario, Thanks for your inputs. We will give your design a try. Due the number of events being processed per window, we are using incremental aggregate fu

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
Thanks a lot for your answer. The onboarding experience to the ASF Slack is indeed not ideal: https://apisix.apache.org/docs/general/join#join-the-slack-channel I'll see if we can improve it On Mon, May 9, 2022 at 3:38 PM Martijn Visser wrote: > As far as I recall you can't sign up for the ASF i

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Martijn Visser
As far as I recall you can't sign up for the ASF instance of Slack, you can only get there if you're a committer or if you're invited by a committer. On Mon, 9 May 2022 at 15:15, Robert Metzger wrote: > Sorry for joining this discussion late, and thanks for the summary Xintong! > > Why are we co

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Dario Heinisch
It depends on the user case,  in Shilpa's use case it is about users so the user ids are probably know beforehand. https://dpaste.org/cRe3G <= This is an example with out an window but essentially Shilpa you would be reregistering the timers every time they fire. You would also have to ingest

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Andrew Otto
This sounds similar to a non streaming problem we had at WMF. We ingest all event data from Kafka into HDFS/Hive and partition the Hive tables in hourly directories. If there are no events in a Kafka topic for a given hour, we have no way of knowing if the hour has been ingested successfully. Fo

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-09 Thread Robert Metzger
Sorry for joining this discussion late, and thanks for the summary Xintong! Why are we considering a separate slack instance instead of using the ASF Slack instance? The ASF instance is paid, so all messages are retained forever, and quite a few people are already on that Slack instance. There is

Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Shilpa Shankar
Hello, We are building a flink use case where we are consuming from a kafka topic and performing aggregations and generating alerts based on average, max, min thresholds. We also need to notify the users when there are 0 events in a Tumbling Event Time Windows. We are having trouble coming up with

Re: trigger once (batch job with streaming semantics)

2022-05-09 Thread Martijn Visser
Hi Georg, No they wouldn't. There is no capability out of the box that lets you start Flink in streaming mode, run everything that's available at that moment and then stops when there's no data anymore. You would need to trigger the stop yourself. Best regards, Martijn On Fri, 6 May 2022 at 13:

Re: Practical guidance with Scala and Flink >= 1.15

2022-05-09 Thread Martijn Visser
Hi Salva, Like Robert said, I don't expect that we will be able to drop support for Scala 2.12 anytime soon. I do think that we should have a discussion in the Flink community about providing Scala APIs. My opinion is that we are probably better off to deprecate the current Scala APIs (keeping it

Re:WaterMark that defined in DDL does not work

2022-05-09 Thread Xuyang
Hi, Huang. I test the SQL with the connector 'datagen', and watermark exists in the we ui. You can change "WATERMARK FOR createtime AS createtime - INTERVAL '5' SECOND" to "WATERMARK FOR createtime AS createtime" and ensure all sutasks contain data for testing. At 2022-05-07 16:41:36, "JianWen H

Re: Flink serialization errors at a batch job

2022-05-09 Thread Robert Metzger
Hi, I suspect that this error is not caused by Flink code (because our serializer stack is fairly stable, there would be more users reporting such issues if it was a bug in Flink). In my experience, these issues are caused by broken serializer implementations (e.g. a serializer being used by multi

Re: Practical guidance with Scala and Flink >= 1.15

2022-05-09 Thread Robert Metzger
Hi Salva, my somewhat wild guess (because I'm not very involved with the Scala development on Flink): I would stick with option 1 for now. It should be easier now for the Flink community to support Scala versions past 2.12 (because we don't need to worry about scala 2.12+ support for Flink's intern

Re: Issue with HybridSource recovering from Savepoint

2022-05-09 Thread Arvid Heise
I'm not sure why recovery from a savepoint would be different than from a checkpoint but if you look for a savepoint test case, PTAL at [1]. I rather think you found some edge case in your recovery setup. Changed degree of parallelism certainly sounds like the most likely option. Or did you upgrad