pojo warning when using auto generated protobuf class

2021-04-23 Thread Prashant Deva
I am seeing this warning msg when trying to use a custom protobuf de/serializer with kafka source with auto generated java protobuf class: 18:41:31.164 [main] INFO org.apache.flink.api.java.typeutils.TypeExtractor - Class class com.xx.APITrace cannot be used as a POJO type because not all fields a

Writing to Avro from pyflink

2021-04-23 Thread Edward Yang
I've been trying to write to the avro format with pyflink 1.12.2 on ubuntu, I've tested my code with an iterator writing to csv and everything works as expected. Reading through the flink documentation I see that I should add jar dependencies to work with avro. I downloaded three jar files that I b

FLINK Invocation error

2021-04-23 Thread Vijayendra Yadav
Hi Team, While restarting Flink application from CHECKPOINT, facing the following Error(intermittently), but it does not impact Job getting submitted or functionality. But still wondering what could be the reason and solution ? *RUN Command:* /usr/lib/flink/bin/flink run \ -s *s3://bucket-app

Re: MemoryStateBackend Issue

2021-04-23 Thread Milind Vaidya
Hi Matthias, Yeah you are right. I am canceling the job and hence it is creating new job with new job id and hence it is no respecting previous checkpoint. I observed same behaviour even for local FS backend. Is there any way to simulated failing of job locally ? As far as config is concerned, I

Too man y checkpoint folders kept for externalized retention.

2021-04-23 Thread John Smith
Hi running 1.10.0. Just curious is this specific to externalized retention or checkpointing in general. I see my checkpoint folder counting thousands of chk-x folders. If using default checkpoint or NONE externalized checkpointing does the count of chk- folders grow indefinitely until th

Re: Official flink java client

2021-04-23 Thread gaurav kulkarni
Thanks for the response, folks! I plan to use the client mostly for monitoring status of jobs, probably to trigger savepoints too. I may extend it in future to submit jobs. Given RestClusterClient is not officially supported, I will probably build something myself. Agree with Flavio, it would b

Re: Correctly serializing "Number" as state in ProcessFunction

2021-04-23 Thread Miguel Araújo
Thanks for your replies. I agree this is a somewhat general problem. I posted it here as I was trying to register the valid subclasses in Kryo but I couldn't get the message to go away, i.e., everything worked correctly but there was the complaint that GenericType serialization was being used. Thi

Re: Question about snapshot file

2021-04-23 Thread David Anderson
Abdullah, ReadRidesAndFaresSnapshot [1] is an example that shows how to use the State Processor API to display the contents of a snapshot taken while running RidesAndFaresSolution [2]. Hopefully that will help you get started. [1] https://github.com/ververica/flink-training/blob/master/state-pro

Re: Question about snapshot file

2021-04-23 Thread Abdullah bin Omar
Hi, Thank you for your reply. I want to read the previous snapshot (if needed) at the time of operation. In [1], there is a portion: DataSet listState = savepoint.readListState<>( "my-uid", "list-state", Types.INT); here, will the function savepoint.readliststate<> () work to read

Re: Re: Re: Official flink java client

2021-04-23 Thread Flavio Pompermaier
Yes, that's a known risk. Indeed it would be awesome if the REST API would be published also using some format that allow automatic client generation (like swagger or openapi). Also release an official client could be an option otherwise...I think that it's very annoying to write a client from scra

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-23 Thread Vishal Santoshi
Great, thanks for the update. The upfront filter does work and has for the last 24 hours and no reason why it should not. Again I have to note that there is no mail group that has been this reactive to issues, so thank you again. On Fri, Apr 23, 2021 at 4:34 AM Matthias Pohl wrote: > After h

Re: Re: Re: Official flink java client

2021-04-23 Thread Yun Gao
Hi Flavio, Got that, from my view I think RestClusterClient might not be viewed as public API, and might be change between version, thus it might need to be careful when upgrading. Best, Yun --Original Mail -- Sender:Flavio Pompermaier Send Date:Fri Apr 23

Re: Approaches for external state for Flink

2021-04-23 Thread Raghavendar T S
Hi Oğuzhan Take a look at bloom filter. You might get better ideas. Links: https://en.wikipedia.org/wiki/Bloom_filter https://stackoverflow.com/questions/4282375/what-is-the-advantage-to-using-bloom-filters https://redislabs.com/modules/redis-bloom/ Thank you On Fri, Apr 23, 2021 at 3:52 PM Oğu

Approaches for external state for Flink

2021-04-23 Thread Oğuzhan Mangır
I'm trying to design a stream flow that checks *de-duplicate* events and sends them to the Kafka topic. Basically, flow looks like that; kafka (multiple topics) => flink (checking de-duplication and event enrichment) => kafka (single topic) For de-duplication, I'm thinking of using Cassandra as

Re: event-time window cannot become earlier than the current watermark by merging

2021-04-23 Thread Matthias Pohl
After having talked to David about this issue offline, I decided to create a Jira ticket FLINK-22425 [1] to cover this. Thanks for reporting it on the mailing list, Vishal. Hopefully, the community has the chance to look into it. Best, Matthias [1] https://issues.apache.org/jira/browse/FLINK-2242

Re: Re: Official flink java client

2021-04-23 Thread Flavio Pompermaier
Obviously I could rewrite a java client from scratch that interface with the provided REST API but why if I can reuse something already existing? Usually I interface with REST API using auto generated clients (if APIs are exposed via Swagger or OpenApi). If that's not an option, writing a REST clie

Re: Re: Official flink java client

2021-04-23 Thread Yun Gao
Hi Falvio, Very thanks for the explanation, may be another option is to have a look at the http rest API[1] ? Flink provides official http api to submit jar jobs and query job status, and they might be able to help. Best, Yun [1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/rest

Re: Debezium CDC | OOM

2021-04-23 Thread Matthias Pohl
Got it. Thanks for clarifying. On Fri, Apr 23, 2021 at 6:36 AM Ayush Chauhan wrote: > Hi Matthias, > > I am using RocksDB as a state backend. I think the iceberg sink is not > able to propagate back pressure to the source which is resulting in OOM for > my CDC pipeline. > Please refer to this -

Re: MemoryStateBackend Issue

2021-04-23 Thread Matthias Pohl
One additional question: How did you stop and restart the job? The behavior you're expecting should work with stop-with-savepoint. Cancelling the job and then just restarting it wouldn't work. The latter approach would lead to a new job being created. Best, Matthias On Thu, Apr 22, 2021 at 3:12 P

Re: Official flink java client

2021-04-23 Thread Flavio Pompermaier
I also interface to Flink clusters using REST in order to avoid many annoying problems (due to dependency conflicts, classpath or env variables). I use an extended version of the RestClusterClient that you can reuse if you want to. It is available at [1] and it add some missing methods to the defau