Re: Issues running multiple Jobs using the same JAR

2021-03-01 Thread Morgan Geldenhuys
med sinks. When EXACTLY_ONCE semantic is enabled for the KafkaProducers we run into a lot of ProducerFencedExceptions and all jobs go into a restart cycle. FLINK-11654: https://issues.apache.org/jira/browse/FLINK-11654 Best, Kezhu Wang On February 28, 2021 at 22:35:02, Morgan Geldenh

Issues running multiple Jobs using the same JAR

2021-02-28 Thread Morgan Geldenhuys
Greetings all, I am having an issue instantiating multiple flink jobs uisng the same JAR in the same Flink native cluster (all 1.12.1). When processing events, the jobs fail with the following trace: org.apache.kafka.common.KafkaException: Cannotperform send because at least one previous

Request: Documentation for External Communication with Flink Cluster

2020-06-15 Thread Morgan Geldenhuys
Hi Community, I am interested in creating an external client for submitting and managing Flink jobs via a HTTP/REST endpoint. Taking a look at the documentation, external communication is possible with the Dispatcher and JobManager

Flink on Kubernetes unable to Recover from failure

2020-05-05 Thread Morgan Geldenhuys
Community, I am currently doing some fault tolerance testing for Flink (1.10) running on Kubernetes (1.18) and am encountering an error where after a running job experiences a failure, the job fails completely. A Flink session cluster has been created according to the documentation

A Strategy for Capacity Testing

2020-04-23 Thread Morgan Geldenhuys
Community, I am interested in knowing what is the recommended way of capacity planning a particular Flink application with current resource allocation. Taking a look at the Flink documentation

Failure detection and Heartbeats

2020-03-10 Thread Morgan Geldenhuys
Hi community, I am interested in knowing more about the failure detection mechanism used by Flink, unfortunately information is a little thin on the ground and I was hoping someone could shed a little light on the topic. Looking at the documentation

Re: How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Morgan Geldenhuys
by using Flink metrics. Please see the documentation for more details: https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html Regards, Roman On Tue, Feb 25, 2020 at 12:33 PM Morgan Geldenhuys <mailto:morgan.geldenh...@tu-berlin.de>> wrote: Hello communit

How to determine average utilization before backpressure kicks in?

2020-02-25 Thread Morgan Geldenhuys
Hello community, I am fairly new to Flink and have a question concerning utilization. I was hoping someone could help. Knowing that backpressure is essentially the point at which utilization has reached 100% for any particular streaming pipeline and means that the application cannot "keep

Identifying Flink Operators of the Latency Metric

2020-02-18 Thread Morgan Geldenhuys
Hi All, I have setup monitoring for Flink (1.9.2) via Prometheus and am interested in viewing the end-to-end latency at the sink operators for the 95 percentile. I have enabled latency markers at the operator level and can see the results, one of the entries looks as follows:

Re: Question: Determining Total Recovery Time

2020-02-11 Thread Morgan Geldenhuys
own metrics as part of your pipeline definition. Regards, Timo On 03.02.20 12:20, Morgan Geldenhuys wrote: > Community, > > I am interested in determining the total time to recover for a Flink > application after experiencing a partial failure. Let's ass

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Morgan Geldenhuys
-table-uber_2.12 1.9.0 provided org.apache.flink flink-connector-kafka_2.12 1.9.0 org.apache.commons commons-lang3 3.9 log4j log4j 1.2.17 junit junit 4.12 On 06.02.20 10:58, Chesnay Schepler wrote: Setup-wise, are there any differenc

Re: Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-06 Thread Morgan Geldenhuys
717 As the job continues this list of series just gets longer and longer. It was working perfectly fine a few months ago, but no idea what is happening now. Any ideas? On 06.02.20 10:23, Chesnay Schepler wrote: What InfluxDB version are you using? On 05/02/2020 19:41, Morgan Geldenhuys wrote: I

Flink TaskManager Logs polluted with InfluxDB metrics reporter errors

2020-02-05 Thread Morgan Geldenhuys
I am trying to setup metrics reporting for Flink using InflixDB, however I am receiving tons of exceptions (listed right at the bottom). Reporting is setup as recommended by the documentation: metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter

Question: Determining Total Recovery Time

2020-02-03 Thread Morgan Geldenhuys
Community, I am interested in determining the total time to recover for a Flink application after experiencing a partial failure. Let's assume a pipeline consisting of Kafka -> Flink -> Kafka with Exactly-Once guarantees enabled. Taking a look at the documentation