Re: Feature Request: Upgrade Kafka Library

2022-01-07 Thread Martijn Visser
Hi Clayton, There is an open ticket and PR to update to 2.8.1. You can track it under https://issues.apache.org/jira/browse/FLINK-25504 Best regards, Martijn Op vr 7 jan. 2022 om 21:36 schreef Clayton Wohl > The latest version of flink-connector-kafka, still uses kafka-clients > 2.4.1. There

Re: Windowing on the consumer side

2022-01-07 Thread Flink Lover
Hi, I tried the code below but it throws an error. val src: DataStream[String] = env.addSource(consumer).windowAll(TumblingEventTimeWindows.of(Time.seconds(2))) // reading data and used data distribution strategy src.process(new JSONParsingProcessFunction()).uid("sink") Error: type mismatch;

Re: Flink rest api to start a job

2022-01-07 Thread Yun Gao
Hi Qihua Sorry may I double confirm that whether the entry class exists in both testA and testB? IF testA.jar is included on startup, it would be loaded in the parent classloader, which is the parent classloader for the user classloader that loads testB. Thus at least if the entry-class is ex

Re: Windowing on the consumer side

2022-01-07 Thread Flink Lover
Can somebody help me with this? I tried several examples where they have extracted the key from Json and used windowing but as far as I have learnt, with windowing I will have to use some kind of aggregation but in my use case there is no aggregation to be performed. I just have to get data for eve

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread Martin
I changed my flink job having an explicit keyBy instead of reinterpretAsKeyedStream.Situation is still the same, so its no problem with combination of reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode. This time I was able to check the logs of the task managers and it seems to be a seri

Feature Request: Upgrade Kafka Library

2022-01-07 Thread Clayton Wohl
The latest version of flink-connector-kafka, still uses kafka-clients 2.4.1. There have been a lot of upgrades in the Kafka consumer/producer library since then. May I request that the Flink project upgrade to a recent version of the Kafka library? thanks!

Re: Skewed Data when joining tables using Flink SQL

2022-01-07 Thread Anne Lai
Hi Yun, Thanks for correcting the wrong assumption, I do plan to use BATCH execution mode. It’s good to know that Flink is able to process the broadcasted state first, although I’m still a bit concerned about the state size. Besides this, did I miss any approach or trick that can be applied in BA

Custom Kafka Keystore on Amazon Kinesis Analytics

2022-01-07 Thread Clayton Wohl
If I want to migrate from FlinkKafkaConsumer to KafkaSource, does the latter support this: https://docs.aws.amazon.com/kinesisanalytics/latest/java/example-keystore.html Basically, I'm running my Flink app in Amazon's Kinesis Analytics hosted Flink environment. I don't have reliable access to the

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread David Morávek
you’re using compile target lower then 1.8, what needs to be done depends on your build tool On Fri 7. 1. 2022 at 20:05, Flink Lover wrote: > Hi David, > > Thanks for your explanation! > > I am familiar with how JVM works but why am I facing this issue? What > exactly needs to be done? > > Thank

[no subject]

2022-01-07 Thread sudhansu jena
Unsubscribe

Re: Avro BulkFormat for the new FileSource API?

2022-01-07 Thread David Morávek
Hi Kevin, I'm not as familiar with initiatives around the new sources, but it seems that the BulkFormat for Avro [1] has been added recently and will be released with the Flink 1.15.x. [1] https://issues.apache.org/jira/browse/FLINK-24565 Best, D. On Fri, Jan 7, 2022 at 7:23 PM Kevin Lam wrote

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread Flink Lover
Hi David, Thanks for your explanation! I am familiar with how JVM works but why am I facing this issue? What exactly needs to be done? Thanks, Martin O. On Sat, Jan 8, 2022 at 12:19 AM David Morávek wrote: > Hi Siddhesh, > > any JVM based language (Java, Scala, Kotlin) compiles into a byte-co

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread David Morávek
Hi Siddhesh, any JVM based language (Java, Scala, Kotlin) compiles into a byte-code that can be executed by the JVM. As the JVM was evolving over the years, new versions of byte code have been introduced. Target version simply refers the the version of bytecode the compiler should generate. How to

Avro BulkFormat for the new FileSource API?

2022-01-07 Thread Kevin Lam
Hi all, We're looking into using the new FileSource API, we see that there is a BulkFormat

Re: [E] Re: Metaspace OOM : class loaders not being GC

2022-01-07 Thread David Clutter
Thanks for the responses. I did switch to per-job mode and it is working well of course. I suspected there wouldn't be an easy solution, but I had to ask. Thanks! On Fri, Jan 7, 2022 at 3:37 AM David Morávek wrote: > Hi David, > > If I understand the problem correctly, there is really nothing

Windowing on the consumer side

2022-01-07 Thread Flink Lover
I have an incoming json data like below: {"custId": 1,"custFirstName":"Martin", "custLastName":"owen","edl_created_at":"2022-03-01 00:00:00"} Now, this record has been pushed successfully via producer to the consumer. But I am willing to get records of say 2 seconds window but I don't have any key

Re: Trigger the producer to send data to the consumer after mentioned seconds

2022-01-07 Thread Flink Lover
Could you please help me with this? On Fri, Jan 7, 2022 at 11:48 AM Flink Lover wrote: > I tried Flink version 1.14.2 / 1.13.5 > > On Fri, Jan 7, 2022 at 11:46 AM Flink Lover > wrote: > >> Also, I am using flink-connector-kafka_2.11 >> >> val consumer = new FlinkKafkaConsumer[String]("topic_nam

Re: Skewed Data when joining tables using Flink SQL

2022-01-07 Thread Yun Gao
Hi Anne, For one thing, for the datastream and broadcast state method, May I have a double confirmation that are you using BATCH execution mode? I think with [1] for BATCH mode it should be able to first process the broadcast side before the non-broadcast side. Best, Yun [1] https://issues.

Re: Serving Machine Learning models

2022-01-07 Thread Yun Gao
Hi Sonia, Sorry I might not have the statistics on the provided two methods, perhaps as input I could also provide another method: currently there is an eco-project dl-on-flink that supports running DL frameworks on top of the Flink and it will handle the data exchange between java and python p

Plans to update StreamExecutionEnvironment.readFiles to use the FLIP-27 compatible FileSource?

2022-01-07 Thread Kevin Lam
Hi all, Are there any plans to update StreamExecutionEnvironment.readFiles to u

Re: Exactly Once Semantics

2022-01-07 Thread Siddhesh Kalgaonkar
Hey Yun, Thanks for your quick response. Much appreciated. I have replied to your answer on SO and I will continue with my doubts over there. Thanks, Sid On Fri, Jan 7, 2022 at 9:05 PM Yun Gao wrote: > Hi Siddhesh, > > I answered on the stackoverflow and I also copied the answers here for > re

Re: Exactly Once Semantics

2022-01-07 Thread Yun Gao
Hi Siddhesh, I answered on the stackoverflow and I also copied the answers here for reference: For the producer side, Flink Kafka Consumer would bookkeeper the current offset in the distributed checkpoint, and if the consumer task failed, it will restarted from the latest checkpoint and re-e

Re: Exactly Once Semantics

2022-01-07 Thread Siddhesh Kalgaonkar
Hi Martijn, Understood. If possible please help me out with the problem. Thanks, Sid On Fri, Jan 7, 2022 at 8:45 PM Martijn Visser wrote: > Hi Siddesh, > > The purpose of both Stackoverflow and the mailing list is to solve a > question or a problem, the mailing list is not for getting attentio

Re: RowType for complex types in Parquet File

2022-01-07 Thread Jing Ge
Hi Meghajit, like the exception described, parquet schema with nested columns is not supported currently. It is on our todo list with high priority. Best regards Jing On Fri, Jan 7, 2022 at 6:12 AM Meghajit Mazumdar < meghajit.mazum...@gojek.com> wrote: > Hello, > > Flink documentation mentions

Re: Exactly Once Semantics

2022-01-07 Thread David Morávek
Also please note that the Apache mailing lists are also indexed by search engines and publicly archived [1]. [1] https://lists.apache.org/list.html?user@flink.apache.org On Fri, Jan 7, 2022 at 4:15 PM Martijn Visser wrote: > Hi Siddesh, > > The purpose of both Stackoverflow and the mailing list

Re: Exactly Once Semantics

2022-01-07 Thread Martijn Visser
Hi Siddesh, The purpose of both Stackoverflow and the mailing list is to solve a question or a problem, the mailing list is not for getting attention. It equivalents crossposting, which we rather don't. As David mentioned, time is limited and we all try to spent it the best we can. Best regards,

Re: Exactly Once Semantics

2022-01-07 Thread Siddhesh Kalgaonkar
Hi David, It's actually better in my opinion. Because people who are not aware of the ML thread can Google and check the SO posts when they come across any similar problems. The reason behind posting on ML is to get attention. Because few questions are unanswered for multiple days and since we are

Re: Exactly Once Semantics

2022-01-07 Thread David Morávek
Hi Siddhesh, can you please focus your questions on one channel only? (either SO or the ML) this could lead to unnecessary work duplication (which would be shame, because the community has limited resources) as people answering on SO might not be aware of the ML thread D. On Fri, Jan 7, 2022 at

Exactly Once Semantics

2022-01-07 Thread Siddhesh Kalgaonkar
I am trying to achieve exactly one semantics using Flink and Kafka. I have explained my scenario thoroughly in this post https://stackoverflow.com/questions/70622321/exactly-once-in-flink-kafka-producer-and-consumer Any help is much appreciated! Thanks, Sid

Re: [ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread David Morávek
Great job! <3 Thanks Dong and Yun for managing the release and big thanks to everyone who has contributed! Best, D. On Fri, Jan 7, 2022 at 2:27 PM Yun Gao wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink ML 2.0.0. > > > > Apache Flink ML provides API a

[ANNOUNCE] Apache Flink ML 2.0.0 released

2022-01-07 Thread Yun Gao
The Apache Flink community is very happy to announce the release of Apache Flink ML 2.0.0. Apache Flink ML provides API and infrastructure that simplifies implementing distributed ML algorithms, and it also provides a library of off-the-shelf ML algorithms. Please check out the release blog po

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread HG
I am watching a ververica youtube playlist first Already did the rides-and-fares stuff. Will certainly look into these. Thanks Ali Op vr 7 jan. 2022 om 11:32 schreef Ali Bahadir Zeybek : > Hello Hans, > > If you would like to see some hands-on examples which showcases the > capabilities of Fli

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread David Anderson
One way to solve this with Flink SQL would be to use MATCH_RECOGNIZE. [1] is an example illustrating a very similar use case. [1] https://stackoverflow.com/a/62122751/2000823 On Fri, Jan 7, 2022 at 11:32 AM Ali Bahadir Zeybek wrote: > Hello Hans, > > If you would like to see some hands-on examp

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread HG
Super Then it will not be a waste of time to learn flink. Thanks! Op vr 7 jan. 2022 om 11:13 schreef Francesco Guardiani < france...@ververica.com>: > So in Flink we essentially have 2 main APIs to define stream topologies: > one is DataStream and the other one is Table API. My guess is that rig

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread Ali Bahadir Zeybek
Hello Hans, If you would like to see some hands-on examples which showcases the capabilities of Flink, I would suggest you follow the training exercises[1]. To be more specific, checkpointing[2] example implements a similar logic to what you have described. Sincerely, Ali [1]: https://github.co

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread Martin
Thanks David for the hints. I checked the usage of the state API and for me it seems to be correct, but I am a new Flink users. Checkpoints happen eachs minute, the scaleing I trigger after 30 minutes.The source and sink are Kafka topics in EXACTLY_ONCE mode.   I tried to simplify the code, but did

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread Francesco Guardiani
So in Flink we essentially have 2 main APIs to define stream topologies: one is DataStream and the other one is Table API. My guess is that right now you're trying to use DataStream with the Kafka connector. DataStream allows you to statically define a stream topology, with an API in a similar fas

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread HG
Hi Francesco. I am not using anything right now apart from Kafka. Just need to know whether Flink is capable of doing this and trying to understand the documentation and terminology etc. I grapple a bit to understand the whole picture. Thanks Regards Hans Op vr 7 jan. 2022 om 09:24 schreef Fran

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
OK, my first intuition would be some kind of misuse of the state API. Other guess would be, has any checkpoint completed prior triggering of the re-scaling event? I'll also try to verify the scenario you've described, but these would be the things that I'd check first. D. On Fri, Jan 7, 2022 at

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread Martin
Hello David, right now I cant share the complete code. But I will try in some days to simplify it and reduce the code to still trigger the issue. First I will check, if the explict keyBy instead of the reinterpretAsKeyedStream  fix the issue.If yes, that would assume - for me - that its a bug with

Re: Metaspace OOM : class loaders not being GC

2022-01-07 Thread David Morávek
Hi David, If I understand the problem correctly, there is really nothing we can do here. Soft references are garbage collected when there is a high memory pressure and the garbage collector needs to free up more memory. The problem here is that the GC doesn't really take high memory pressure on Me

Serving Machine Learning models

2022-01-07 Thread Sonia-Florina Horchidan
Hello, I recently started looking into serving Machine Learning models for streaming data in Flink. To give more context, that would involve training a model offline (using PyTorch or TensorFlow), and calling it from inside a Flink job to do online inference on newly arrived data. I have found

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
Would you be able share the code of your test by any chance? Best, D. On Fri, Jan 7, 2022 at 10:06 AM Martin wrote: > Hello David, > > I have a test setup, where the input is all the time the same. > After processing, I check all the output if each sequence number ist just > used once. > > Anot

Re: Flink Kinesis Producer con't connect with AWS credentials

2022-01-07 Thread Matthias Pohl
I'm adding Danny to this thread. He might be able to help on this topic. Best, Matthias On Mon, Jan 3, 2022 at 4:57 PM Daniel Vol wrote: > I definitely do, and you can see in my initial post that this is the first > thing I tried but I got warnings and it doesn't use credentials I supplied. > T

Skewed Data when joining tables using Flink SQL

2022-01-07 Thread Anne Lai
Hi, I have a Flink batch job that needs to join a large skewed table with a smaller table, and because records are not evenly distributed to each subtask, it always fails with a "too much data in partition" error. I explored using DataStream API to broadcast the smaller tables as a broadcast state

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread Martin
Hello David, I have a test setup, where the input is all the time the same.After processing, I check all the output if each sequence number ist just used once. Another output field is a random UUID generated on startup of a Task (in the open-method of the (c)-keyed process function).In the output I

Re: reinterpretAsKeyedStream and Elastic Scaling in Reactive Mode

2022-01-07 Thread David Morávek
Hi Martin, _reinterpretAsKeyedStream_ should definitely work with the reactive mode, if it doesn't it's a bug that needs to be fixed > For test use cases (3) and (4) the state of the keyed process function (c) > seems only available for around 50% of the events processed after > scale-in/fail. >

Re: Moving off of TypeInformation in Flink 1.11

2022-01-07 Thread Francesco Guardiani
Hi Sofya, DataStream API doesn't use DataTypes, but it still uses TypeInformation. DataTypes and LogicalTypes are relevant only for Table API. If I understood what you're trying to do, you don't need to manually transform to Row, but you only need to define the Schema when crossing the boundary fr

Re: adding elapsed times to events that form a transaction

2022-01-07 Thread Francesco Guardiani
Hi, Are you using SQL or DataStream? For SQL you can use the Window TVF feature, where the window size is the "max" elapsed time, and then inside the window you pick the beginning and end event and j