Re: Kafka Consumer Retries Failing

2021-07-19 Thread Rahul Patwari
be a reason behind your Kafka issues. Hard to tell. Definitely > you shouldn't have heartbeat timeouts in your cluster, so something IS > wrong with your setup. > > Best, > Piotrek > > czw., 15 lip 2021 o 17:17 Rahul Patwari > napisał(a): > >> Thanks for the feedbac

Re: Kafka Consumer Retries Failing

2021-07-15 Thread Rahul Patwari
, but I don't see a way how one could cause the other. > > Piotrek > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/metrics/#default-shuffle-service > > śr., 14 lip 2021 o 14:18 Rahul Patwari > napisał(a): > >> Thanks, Piotrek. &g

Re: java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-15 Thread Rahul Patwari
; ResultFuture>, Map, List< > Map>>> resultFuture) { > //Timed out. Just discard > System.out.println("Timeout:" ); > > On Wed, Jul 14, 2021 at 9:40 AM Rahul Patwari > wrote: > >> Hi Ragini, >> >> From the stack trace, the job failed as t

Re: Kafka Consumer Retries Failing

2021-07-14 Thread Rahul Patwari
streaming.api.operators.StreamSource.run(StreamSource.java:63)\n\tat org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:201)\n\n" } On Wed, Jul 14, 2021 at 2:39 PM Piotr Nowojski wrote: > Hi, > > Waiting for memory from Lo

Re: java.lang.Exception: Could not complete the stream element: Record @ 1626200691540 :

2021-07-13 Thread Rahul Patwari
Hi Ragini, >From the stack trace, the job failed as the Async I/O Operator has timed out for an event. The timeout is configurable. Please refer https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/#async-io-api Quoting from above documentation: > Timeout

Re: Kafka Consumer Retries Failing

2021-07-13 Thread Rahul Patwari
pper around the > KafkaConsumer class, so the thing you are observing has most likely very > little to do with the Flink itself. In other words, if you are observing > such a problem you most likely would be possible to reproduce it without > Flink. > > Best, > Pio

Kafka Consumer Retries Failing

2021-07-09 Thread Rahul Patwari
Hi, We have a Flink 1.11.1 Version streaming pipeline in production which reads from Kafka. Kafka Server version is 2.5.0 - confluent 5.5.0 Kafka Client Version is 2.4.1 - {"component":"org.apache.kafka.common.utils.AppInfoParser$AppInfo","message":"Kafka version: 2.4.1","method":""} Occasionally

Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

2021-04-13 Thread Rahul Patwari
. > > Also let me repeat these questions > >> 1. Why do you need the window operator at all? Couldn't you just >> backpressure on the async I/O by delaying the processing there? >> > 2. What's keeping you from attaching the offset of the Kafka records to A, &g

Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

2021-04-13 Thread Rahul Patwari
y do you need the window operator at all? Couldn't you just backpressure > on the async I/O by delaying the processing there? Then there would be no > need to change anything. > > On Tue, Apr 13, 2021 at 10:00 AM Roman Khachatryan > wrote: > >> Hi Rahul, >> &g

Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

2021-04-12 Thread Rahul Patwari
fault connector. > > Regards, > Roman > > On Mon, Apr 12, 2021 at 3:14 PM Rahul Patwari > wrote: > > > > Hi Roman, > > > > Thanks for the reply. > > This is what I meant by Internal restarts - Automatic restore of Flink > Job from a failure. For

Re: Manage Kafka Offsets for Fault Tolerance Without Checkpointing

2021-04-12 Thread Rahul Patwari
loss. > Otherwise (if you commit offsets earlier), you have to persist > in-flight records to avoid data loss (i.e. enable checkpointing). > > Regards, > Roman > > On Mon, Apr 12, 2021 at 12:11 PM Rahul Patwari > wrote: > > > > Hello, > > > > Con

Manage Kafka Offsets for Fault Tolerance Without Checkpointing

2021-04-12 Thread Rahul Patwari
Hello, *Context*: We have a stateless Flink Pipeline which reads from Kafka topics. The pipeline has a Windowing operator(Used only for introducing a delay in processing records) and AsyncI/O operators (used for Lookup/Enrichment). "At least Once" Processing semantics is needed for the pipeline

Re: sporadic "Insufficient no of network buffers" issue

2020-08-02 Thread Rahul Patwari
initely help in this case. can you please confirm our findings and probably suggest some possible ways to mitigate this issue. Rahul On Sat, Aug 1, 2020 at 9:24 PM Rahul Patwari wrote: > From the metrics in Prometheus, we observed that the minimum > AvailableMemorySegments out of all the ta

Re: sporadic "Insufficient no of network buffers" issue

2020-08-01 Thread Rahul Patwari
uired for > your job. I never used 1.4.x, so don’t know about it. > > Ivan > > On Jul 31, 2020, at 11:37 PM, Rahul Patwari > wrote: > > Thanks for your reply, Ivan. > > I think taskmanager.network.memory.max is by default 1GB. > In my case, the network buffers memo

Re: sporadic "Insufficient no of network buffers" issue

2020-07-31 Thread Rahul Patwari
f proportion to have 1GB network buffer with 4GB total RAM. Reducing > number of shuffling will require less network buffer. But if your job need > the shuffling, then you may consider to add more memory to TM. > > Thanks, > Ivan > > On Jul 31, 2020, at 2:02 PM, Rahul Patwari &

sporadic "Insufficient no of network buffers" issue

2020-07-31 Thread Rahul Patwari
Hi, We are observing "Insufficient number of Network Buffers" issue Sporadically when Flink is upgraded from 1.4.2 to 1.8.2. The state of the tasks with this issue translated from DEPLOYING to FAILED. Whenever this issue occurs, the job manager restarts. Sometimes, the issue goes away after the re