Re: stream consume from kafka after DB scan is done

2021-11-05 Thread Qihua Yang
If HybridSource cannot support JdbcSource, is there any approach I can try? sequentially read input from two sources. After read data from database, start to read data from kafka topic? Thanks, Qihua On Fri, Nov 5, 2021 at 10:44 PM Qihua Yang wrote: > Hi Austin, > > That is exactly what I want.

Re: stream consume from kafka after DB scan is done

2021-11-05 Thread Qihua Yang
Hi Austin, That is exactly what I want. Is it possible to use JdbcTableSource as the first source? Looks like only FileSource can be used as the first source? Below is the error. val jdbcSource = JdbcTableSource.builder() .setOptions(options) .setReadOptions(readOptions) .setLookupOpt

Re: stream consume from kafka after DB scan is done

2021-11-05 Thread Austin Cawley-Edwards
Hey Qihua, If I understand correctly, you should be able to with the HybridSource, released in 1.14 [1] Best, Austin [1]: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/hybridsource/ On Fri, Nov 5, 2021 at 3:53 PM Qihua Yang wrote: > Hi, > > Our stream h

Re: Running a performance benchmark load test of Flink on new CPUs

2021-11-05 Thread Austin Cawley-Edwards
Hi Vijay, I'm not too familiar with the subject, but maybe you could have a look at the flink-faker[1], which generates fake data. I would think you could use it to write to kafka in one Flink job, and then have another Flink job to ingest and run your benchmarks. There is also this microbenchmar

Running a performance benchmark load test of Flink on new CPUs

2021-11-05 Thread Vijay Balakrishnan
Hi, I am a newbie to running a performance benchmark load test of Flink on new CPUs. Is there an* existing workload generator* that I can use with Kafka and then ingest it with Flink KafkaConnector & test the performance against various new chips on servers ? Measuring CPU performance etc, vCPU us

stream consume from kafka after DB scan is done

2021-11-05 Thread Qihua Yang
Hi, Our stream has two sources. one is a Kafka topic, one is a database. Is it possible to consume from kafka topic only after DB scan is completed? We configured it in batch mode. Thanks, Qihua

Re: Flink sink data to DB and then commit data to Kafka

2021-11-05 Thread Qihua Yang
Hi Ali, Thank you so much! That is very helpful. Thanks, Qihua On Wed, Nov 3, 2021 at 2:46 PM Ali Bahadir Zeybek wrote: > Hello Qihua, > > This will require you to implement and maintain your own database insertion > logic using any of the clients that your database and programming language >

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Krzysztof Chmielewski
Thanks Fabian, I was looking forward to use the unified Source interface in my use case. The implementation was very intuitive with this new design. I will try with TableFunction then. Best. Krzysztof Chmielewski pt., 5 lis 2021 o 14:20 Fabian Paul napisał(a): > Hi Krzysztof, > > The blog post

Re: to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
In my situation, the streams are of the same type, which means union is an option. However, will creating new stream with union perform more slowly than processing connected streams? I want to use the option that performs better. The logic on the data is actually very simple. But both streams w

Re: to join or not to join, that is the question...

2021-11-05 Thread Timo Walther
Union can be an option if you want to unify the streams first and then apply a key by on the common stream. Otherwise connect() is the way to go. See an example for joining here: https://github.com/twalthr/flink-api-examples/blob/main/src/main/java/com/ververica/Example_06_DataStream_Join.java

Re: to join or not to join, that is the question...

2021-11-05 Thread Dario Heinisch
Union creates a new stream containing all elements of the unioned streams: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/overview/#union On 05.11.21 14:25, Marco Villalobos wrote: Can two different streams flow to the same operator (an operator with t

to join or not to join, that is the question...

2021-11-05 Thread Marco Villalobos
Can two different streams flow to the same operator (an operator with the same name, uid, and implementation) and then share keyed state or will that require joining the streams first?

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
Hi Krzysztof, The blog post is not building a lookup source but only a scan source. For scan sources you can choose between the old RichSourceFunction or the new unified Source interface. For lookup sources you need to implement either a TableFunction or a AsyncTableFunction there are current

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Krzysztof Chmielewski
Ok, I think there is some misunderstanding here. As it is presented in [1] for implementing Custom Source Connector for Table API and SQL: *"You first need to have a source connector which can be used in Flink’s runtime system (...)* *For complex connectors, you may want to implement the Source

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
I think neither Source nor RichSourceFunction are correct in this case. You can have a look at the Jdbc lookup source[1][2]. Your function needs to implement TableFunction. Best, Fabian [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-jdbc/src/main/java/org/apac

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Krzysztof Chmielewski
Thank you Fabian, what if I would rewrite my custom Source to use old RichSourchFunction instead unified Source Interface? Would it work then as Lookup? Best, Krzysztof pt., 5 lis 2021 o 11:18 Fabian Paul napisał(a): > Hi Krzysztof, > > The unified Source is meant to be used for the DataStream

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Fabian Paul
Hi Krzysztof, The unified Source is meant to be used for the DataStream API and Table API. Currently, we do not have definition of look up sources in the DataStream API therefore the new source do not work as lookups and only as scan sources. Maybe in the future we also want to define look ups

Re: How to write unit test for stateful operators in Pyflink apps

2021-11-05 Thread Long Nguyễn
Thanks, Fabian. I'll check it out. Hope that Dian can also give me some advice. Best, Long On Fri, Nov 5, 2021 at 3:48 PM Fabian Paul wrote: > Hi, > > Since you want to use Table API you probably can write a more high-level > test around executing the complete program. A good examples are the

Re: Implementing a Custom Source Connector for Table API and SQL

2021-11-05 Thread Krzysztof Chmielewski
BTW, @Ingo Burk You wrote that "*the new, unified Source interface can only work as a scan source.*" Is there any special design reason behind it or its just simply not yet implemented? Thanks, Krzysztof Chmielewski czw., 4 lis 2021 o 16:27 Krzysztof Chmielewski < krzysiek.chmielew...@gmail.co

Re: "sink.partition-commit.success-file.name" option in the FileSystem connector does not work

2021-11-05 Thread Long Nguyễn
Thank you, Paul. The answer is so clear and helpful. But I'm still wondering what is the purpose of the sink.partition-commit.success-file.name option if the sin

Rabbitmq connector and exchanges

2021-11-05 Thread ivan.ros...@agilent.com
Thanks Fabian, Yeah, looks like overriding setupQueue() could work, I'll give it a shot. I'm not exactly clear on what I mean by using a stateful function as a flink sink. First thought was to sink aggregations from long tumbling windows (say minutes to hours or more), assuming the stateful fu

Re: Question on BoundedOutOfOrderness

2021-11-05 Thread Oliver Moser
Thanks Guowei and Alexey, looking at the references you provided helped. I managed to put together simple examples using both the streaming Table API as well as using the CEP library, and I’m now able to process events in event time order. I believe to me my interpretation of setting up watermar

Re: How to write unit test for stateful operators in Pyflink apps

2021-11-05 Thread Fabian Paul
Hi, Since you want to use Table API you probably can write a more high-level test around executing the complete program. A good examples are the pyflink example programs [1]. I also could not find something similar to the testing harness from Java. I cced Dian maybe he knows more about it. Be

Re: How to filter these emails?

2021-11-05 Thread Fabian Paul
Hi Ivan, Please try to always cc the user mailing list otherwise others won’t see your question :) You are right by default the Rabbit MQ seems to not support exchanges but take a look at this [1]. I think you can inherit from the RMQSource class and override the setupQueue method to bind to

Re: Table DataStream Conversion Lost Watermark

2021-11-05 Thread Timo Walther
Hi Yunfeng, by default the fromDataStream does not propagate watermarks into Table API. Because Table API needs a time attribute in the schema that corresponds to the watermarking. A time attribute will also put back into the stream record during toDataStream. Please take a look at: https:/

unsubscribe

2021-11-05 Thread Peter Schrott
unsubscribe

Re: "sink.partition-commit.success-file.name" option in the FileSystem connector does not work

2021-11-05 Thread Fabian Paul
Hi, Currently this is expected because the FileSink is built to support running with higher parallelism. Therefore it needs to periodically write files. The respective file names always have a descriptor that the File Sink knows which files have already been written. You can read more about the

Re: How to filter these emails?

2021-11-05 Thread Fabian Paul
Hi Ivan, You can configure your email client to filter for messages going to user@flink.apache.org and move them into a separate folder. Best, Fabian

How to filter these emails?

2021-11-05 Thread ivan.ros...@agilent.com
Hello All, Apologies for spamming the list. As a new member, I am both happy to be getting emails and also pretty overwhelmed to have them appear directly in my inbox. As far as I can tell there is no sender email or other predictable information I can use to put these messages into a nice Fl

"sink.partition-commit.success-file.name" option in the FileSystem connector does not work

2021-11-05 Thread Long Nguyễn
Hi. I am trying to use the FileSystem connector to simply read data from a text file and then write that data to an output CSV file. I notice that Flink allows specifying the name of the output file by