Re: On efficient checkpoints with dynamic (self-evolving) keyed state

2020-04-06 Thread Salva Alcántara
I guess another option not mentioned in my question could be to use a custom serializer for the models. This way, I would not need to consider serialization issues myself within the process function and the snapshots for my models would be taken only once per checkpoint as desired -- Sent from:

Re: On efficient checkpoints with dynamic (self-evolving) keyed state

2020-04-06 Thread Salva Alcántara
Yet another option would be to use operator state instead, but this looks trickier to me. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Upgrading Flink

2020-04-06 Thread Stephen Connolly
Quick questions on upgrading Flink. All our jobs are compiled against Flink 1.8.x We are planning to upgrade to 1.10.x 1. Is the recommended path to upgrade one minor at a time, i.e. 1.8.x -> 1.9.x and then 1.9.x -> 1.10.x as a second step or is the big jump supported, i.e. 1.8.x -> 1.10.x in on

Re: “feedback loop” and checkpoints in itearative streams

2020-04-06 Thread Igal Shilman
Hi, I don't know what is the status of iterations at the moment, and whatever the community has plans to work at that, But I would like to point you to Flink Stateful Functions [1], a recent contribution to Apache Flink that allows building applications composed of stateful functions that can invok

Creating singleton objects per task manager

2020-04-06 Thread Salva Alcántara
I need to create a singleton (manager) object to be used within all the parallel instances of my UDF operator (a `ProcessFunction`). What is the proper way of creating such a singleton object per task manager? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

how to hold a stream until another stream is drained?

2020-04-06 Thread 刘宇宝
I’m using JDBCInputFormat to read snapshot of a MySQL table and FlinkKafkaConsumer to read binlog which is written to Kafka by Debezium. DataStream binlogStream = env.addSource(new FlinkKafkaConsumer(…)); DataStream snapshotStream = env.createInput(JDBCInputFormat.buildJDBCInputFormat()….); //

Re: Creating singleton objects per task manager

2020-04-06 Thread Seth Wiesman
Hi Salva, One TaskManager == One JVM. There is nothing Flink specific here, you can just create a singleton how you would in any other JVM application. But be careful, if your singleton does any sort of locking/coordination it will quickly become the bottleneck in your application. I would strongl

New kafka producer on each checkpoint

2020-04-06 Thread Maxim Parkachov
Hi everyone, I'm trying to test exactly once functionality with my job under production load. The job is reading from kafka, using kafka timestamp as event time, aggregates every minute and outputs to other kafka topic. I use checkpoint interval 10 seconds. Everything seems to be working fine, bu

Pulsar as a state backend

2020-04-06 Thread Michael Colson
Hello, I recently browse this post : https://flink.apache.org/2019/05/03/pulsar-flink.html and mainly : *Finally, an alternative way of integrating the technologies could include using Pulsar as a state backend with Flink. Since Pulsar has a layered architecture (Streams and Segmented Streams, po

State Processor API with Beam

2020-04-06 Thread Stephen Patel
I've got an apache beam pipeline running on flink (1.9.1). I've been attempting to read a RocksDB savepoint taken from this beam-on-flink pipeline, using the state processor api, however it seems to have some incompatibilities around namespaces. Beam for instance uses a String namespace, while th

Re: State Processor API with Beam

2020-04-06 Thread Seth Wiesman
Hi Stephen, You will need to implement a custom operator and user the `transform` method. It's not just that you need to specify the namespace type but you will also need to look into the beam internals to see how it stores data in flink state, how it translates between beam serializers and flink

Re: Flink job getting killed

2020-04-06 Thread Fabian Hueske
Hi Giriraj, This looks like the deserialization of a String failed. Can you isolate the problem to a pair of sending and receiving tasks? Best, Fabian Am So., 5. Apr. 2020 um 20:18 Uhr schrieb Giriraj Chauhan < graj.chau...@gmail.com>: > Hi, > > We are submitting a flink(1.9.1) job for data pro

Re: Storing Operator state in RocksDb during runtime - plans

2020-04-06 Thread Fabian Hueske
Hi Kristoff, I'm not aware of any concrete plans for such a feature. Best, Fabian Am So., 5. Apr. 2020 um 22:33 Uhr schrieb KristoffSC < krzysiek.chmielew...@gmail.com>: > Hi, > according to [1] operator state and broadcast state (which is a "special" > type of operator state) are not stored in

Re: how to hold a stream until another stream is drained?

2020-04-06 Thread Fabian Hueske
Hi, With Flink streaming operators However, these parts are currently being reworked to enable a better integration of batch and streaming use cases (or hybrid use cases such as yours). A while back, we wrote a blog post about these plans [1]: > *"Unified Stream Operators:* Blink extends the Fli

upgrade flink from 1.9.1 to 1.10.0 on EMR

2020-04-06 Thread aj
Hello All, I am running Flink on AWS EMR, as currently the latest version available on EMR is 1.9.1 but I want to upgrade to 1.10.0. I tried to manually replace lib jars by downloading the 1.10.0 version but this is not working. I am getting the following exception when trying to submit a job on y

Re: flink 1.9 conflict jackson version

2020-04-06 Thread aj
Hi Fanbin, I am facing a similar kind of issue. Let me know if you are able to resolve this issue then please help me also https://stackoverflow.com/questions/61012350/flink-reading-a-s3-file-causing-jackson-dependency-issue On Tue, Dec 17, 2019 at 7:50 AM ouywl wrote: > Hi Bu >I think I