Hello Kanchi, It's recommended to submit a separate request or issue for the problem you're encountering, as the data pipeline is distinct from the one Neha raised. This will help ensure that each issue can be addressed individually and efficiently.
Hello Neha, Not sure about the issue you are encountering. Could you provide details of Flink cluster - #TM and size - memory, CPU allocated with server capacity availability. Your issue seems to be associated with -- GlobalWindowAggregate and UnsortedAlert sink pipeline section as both #2 and #3 error snippet, TM throws the below error. org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - GlobalWindowAggregate[25] -> (Calc[26] -> UnsortedAlert[27]: Writer -> UnsortedAlert[27]: Committer, Calc[390] -> UnsortedAlert[391]: Writer -> UnsortedAlert[391]: Committer) (7/20)#0 On Thu, Feb 15, 2024 at 12:07 PM Kanchi Masalia via user < user@flink.apache.org> wrote: > Hi! > > We just encountered a similar issue. > > This is usually caused by: 1) Akka failed sending the message silently, > due to problems like oversized payload or serialization failures. In that > case, you should find detailed error information in the logs. 2) The > recipient needs more time for responding, due to problems like slow > machines or network jitters. In that case, you can try to increase > akka.ask.timeout. > > Were you able to resolve the issue? Could you suggest what worked for you? > > > Thanks, > Kanchi Masalia > > > > On Tue, Aug 29, 2023 at 6:18 AM Neha Rawat <n.ra...@netwitness.com> wrote: > >> Thanks for your response. >> >> I did check and the memory utilization looks fine. Attaching a VisualVM >> screenshot. Memory usage was well under the limits. >> >> There are phases of low CPU consumption by Flink (below 10% with spikes >> that go upto 100%) and number of threads go down as well. That’s the time >> when I see Error#1. #2 and #3 as listed in the original email. >> >> >> >> Thanks, >> >> Neha >> >> >> >> *From:* Kenan Kılıçtepe <kkilict...@gmail.com> >> *Sent:* Monday, August 28, 2023 4:25 PM >> *To:* Neha Rawat <n.ra...@netwitness.com> >> *Cc:* user@flink.apache.org >> *Subject:* Re: Task Manager getting killed while executing sql queries. >> >> >> >> You don't often get email from kkilict...@gmail.com. Learn why this is >> important <https://aka.ms/LearnAboutSenderIdentification> >> >> *CAUTION:*External email. Do not click or open attachments unless you >> know and trust the sender. >> >> >> >> Can it be a memory leak? Have you observed the memory consumption of >> task managers? >> >> Once, task manager crush issue happened for me and it was OOM. >> >> >> >> >> >> On Mon, Aug 28, 2023 at 9:12 PM Neha Rawat <n.ra...@netwitness.com> >> wrote: >> >> Hi, >> >> >> >> Need some help with the below situation. If would be great if someone >> could give some pointers on how to resolve this. >> >> >> >> I am trying to execute 80 SQL queries on data coming in from source – >> Kafka Topic Event. The results are written to 2 sinks – Kafka topics – >> Alert & UnsortedAlert. >> >> The input data is coming in at an EPS of about 230K Events/sec. >> >> >> >> - Kafka was up and running all the time on the same host as flink. >> - *Job configurations=>* --parallelism 20 --checkpoint-interval 60 >> --aligned-checkpoint-timeout 5 --min-pause-checkpoints 5 --table-state-ttl >> 360 >> - >> *Flink config - *state.backend.rocksdb.localdir: >> /var/netwitness/flink/rocksdb >> >> state.backend.rocksdb.memory.write-buffer-ratio: 0.75 >> >> state.backend.rocksdb.log.level: WARN_LEVEL >> >> state.backend.rocksdb.log.dir: /dev/null >> >> state.backend.rocksdb.log.max-file-size: 1MB >> >> state.backend.rocksdb.log.file-num: 1 >> >> state.backend.rocksdb.thread.num: 8 >> >> execution.checkpointing.timeout: 20 min >> >> >> >> Have also tried the same test with flink config that has changelog >> enabled, have changed min-pause-checkpoints to 50 but the observation >> remains the same. >> >> >> >> *Observations*- >> >> - Checkpointing initially takes less than a second, but as the test >> progresses there are phases (3-4 consecutive checkpoints) where it takes >> more than a minutes and sometimes up to 9 minutes. >> - Task manager gets killed after a couple of hours. >> - These are the errors in taskExecutor logs – >> >> >> >> Error 1- >> >> 2023-08-22 22:53:59,441 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-16, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 22:53:59,441 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-16, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 445553 due to node 1 being disconnected (elapsed time since >> creation: 147696ms, elapsed time since send: 147696ms, request timeout: >> 30000ms) >> >> 2023-08-22 22:53:59,441 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-16, groupId=Event] Cancelled in-flight METADATA request with >> correlation id 445555 due to node 1 being disconnected (elapsed time since >> creation: 1ms, elapsed time since send: 1ms, request timeout: 30000ms) >> >> 2023-08-22 22:53:59,441 INFO >> org.apache.kafka.clients.FetchSessionHandler [] - [Consumer >> clientId=Event-16, groupId=Event] Error sending fetch request >> (sessionId=1726574973, epoch=367) to node 1: >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 22:54:06,975 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-19, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 22:54:06,975 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-19, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 446105 due to node 1 being disconnected (elapsed time since >> creation: 42100ms, elapsed time since send: 42100ms, request timeout: >> 30000ms) >> >> 2023-08-22 22:54:06,975 INFO >> org.apache.kafka.clients.FetchSessionHandler [] - [Consumer >> clientId=Event-19, groupId=Event] Error sending fetch request >> (sessionId=1951018275, epoch=1686) to node 1: >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 23:03:17,824 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-1, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 23:03:17,825 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-1, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 438228 due to node 1 being disconnected (elapsed time since >> creation: 61377ms, elapsed time since send: 61377ms, request timeout: >> 30000ms) >> >> 2023-08-22 23:03:17,826 INFO >> org.apache.kafka.clients.FetchSessionHandler [] - [Consumer >> clientId=Event-1, groupId=Event] Error sending fetch request >> (sessionId=869666967, epoch=5) to node 1: >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 23:05:06,655 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-3, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 23:05:06,655 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-3, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 439255 due to node 1 being disconnected (elapsed time since >> creation: 79499ms, elapsed time since send: 79499ms, request timeout: >> 30000ms) >> >> 2023-08-22 23:05:06,656 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-3, groupId=Event] Cancelled in-flight METADATA request with >> correlation id 439256 due to node 1 being disconnected (elapsed time since >> creation: 1ms, elapsed time since send: 1ms, request timeout: 30000ms) >> >> 2023-08-22 23:05:06,656 INFO >> org.apache.kafka.clients.FetchSessionHandler [] - [Consumer >> clientId=Event-3, groupId=Event] Error sending fetch request >> (sessionId=26073738, epoch=4) to node 1: >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 23:07:00,107 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-9, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 23:07:00,107 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-9, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 441161 due to node 1 being disconnected (elapsed time since >> creation: 44847ms, elapsed time since send: 44847ms, request timeout: >> 30000ms) >> >> 2023-08-22 23:07:00,108 INFO >> org.apache.kafka.clients.FetchSessionHandler [] - [Consumer >> clientId=Event-9, groupId=Event] Error sending fetch request >> (sessionId=2113325484, epoch=10) to node 1: >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 23:10:07,342 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-12, groupId=Event] Disconnecting from node 1 due to request >> timeout. >> >> 2023-08-22 23:10:07,343 INFO >> org.apache.kafka.clients.NetworkClient [] - [Consumer >> clientId=Event-12, groupId=Event] Cancelled in-flight FETCH request with >> correlation id 440678 due to node 1 being disconnected (elapsed time since >> creation: 51417ms, elapsed time since send: 51417ms, request timeout: >> 30000ms) >> >> >> >> Error 2- >> >> org.apache.kafka.common.errors.DisconnectException: null >> >> 2023-08-22 23:10:18,246 INFO >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - >> GlobalWindowAggregate[25] -> (Calc[26] -> UnsortedAlert[27]: Writer -> >> UnsortedAlert[27]: Committer, Calc[390] -> UnsortedAlert[391]: Writer -> >> UnsortedAlert[391]: Committer) (6/20)#0 - asynchronous part of >> checkpoint 5450 could not be completed. >> >> java.util.concurrent.CancellationException: null >> >> at java.util.concurrent.FutureTask.report(FutureTask.java:121) >> ~[?:?] >> >> at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] >> >> at >> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:57) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) >> [flink-dist-1.17.1.jar:1.17.1] >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> [?:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> [?:?] >> >> at java.lang.Thread.run(Thread.java:829) [?:?] >> >> 2023-08-22 23:10:18,245 INFO >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - >> GlobalWindowAggregate[31] -> (Calc[32] -> UnsortedAlert[33]: Writer -> >> UnsortedAlert[33]: Committer, Calc[392] -> UnsortedAlert[393]: Writer -> >> UnsortedAlert[393]: Committer) (12/20)#0 - asynchronous part of checkpoint >> 5450 could not be completed. >> >> java.util.concurrent.CancellationException: null >> >> at java.util.concurrent.FutureTask.report(FutureTask.java:121) >> ~[?:?] >> >> at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] >> >> at >> org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:54) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) >> [flink-dist-1.17.1.jar:1.17.1] >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> [?:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> [?:?] >> >> at java.lang.Thread.run(Thread.java:829) [?:?] >> >> >> >> >> >> >> >> Error 3- >> >> 2023-08-22 23:10:23,647 INFO >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable [] - >> GlobalWindowAggregate[25] -> (Calc[26] -> UnsortedAlert[27]: Writer -> >> UnsortedAlert[27]: Committer, Calc[390] -> UnsortedAlert[391]: Writer -> >> UnsortedAlert[391]: Committer) (7/20)#0 - asynchronous part of >> checkpoint 5450 could not be completed. >> >> java.util.concurrent.ExecutionException: >> org.apache.flink.runtime.checkpoint.CheckpointException: The checkpoint was >> aborted due to exception of other subtasks sharing the ChannelState file. >> >> at >> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) >> ~[?:?] >> >> at >> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) >> ~[?:?] >> >> at >> org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.<init>(OperatorSnapshotFinalizer.java:66) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) >> [flink-dist-1.17.1.jar:1.17.1] >> >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >> [?:?] >> >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >> [?:?] >> >> at java.lang.Thread.run(Thread.java:829) [?:?] >> >> Caused by: org.apache.flink.runtime.checkpoint.CheckpointException: The >> checkpoint was aborted due to exception of other subtasks sharing the >> ChannelState file. >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateCheckpointWriter.fail(ChannelStateCheckpointWriter.java:298) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.failAndClearWriter(ChannelStateWriteRequestDispatcherImpl.java:212) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.handleCheckpointAbortRequest(ChannelStateWriteRequestDispatcherImpl.java:189) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatchInternal(ChannelStateWriteRequestDispatcherImpl.java:129) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestDispatcherImpl.dispatch(ChannelStateWriteRequestDispatcherImpl.java:94) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.loop(ChannelStateWriteRequestExecutorImpl.java:161) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.checkpoint.channel.ChannelStateWriteRequestExecutorImpl.run(ChannelStateWriteRequestExecutorImpl.java:116) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> ... 1 more >> >> Caused by: java.util.concurrent.CancellationException: checkpoint aborted >> via notification >> >> at >> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpoint(SubtaskCheckpointCoordinatorImpl.java:455) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointAborted(SubtaskCheckpointCoordinatorImpl.java:409) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointAbortAsync$17(StreamTask.java:1387) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$19(StreamTask.java:1410) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMail(MailboxProcessor.java:398) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMailsWhenDefaultActionUnavailable(MailboxProcessor.java:367) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:352) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:229) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at >> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> >> >> >> >> Error 4 – >> >> 2023-08-23 10:00:52,852 WARN >> org.apache.flink.runtime.taskmanager.Task [] - Source: >> Event[10] -> (MiniBatchAssigner[11] -> (Calc[12] -> >> (LocalWindowAggregate[13], LocalWindowAggregate[381]), Calc[22] -> >> LocalW indowAggregate[23], Calc[28] -> LocalWindowAggregate[29], >> Calc[34] -> (LocalWindowAggregate[35], LocalWindowAggregate[394]), Calc[40] >> -> LocalWindowAggregate[41], Calc[46] -> (LocalWindowAggregate[47], >> LocalWindowAggregate[401]), Calc[52] -> LocalWindowAggregate[53], >> Calc[58] -> (WindowTableFunction[59] -> Calc[60] -> >> LocalWindowAggregate[61], WindowTableFunction[408] -> Calc[409] -> >> LocalWindowAggregate[410]), Calc[66] -> LocalWindowAggregate[67], >> Calc[ 72] -> WindowTableFunction[73] -> Calc[74] -> >> LocalWindowAggregate[75], Calc[80] -> LocalWindowAggregate[81], Calc[86] -> >> LocalWindowAggregate[87], Calc[92] -> LocalWindowAggregate[93], Calc[98] -> >> (WindowTableFunction[99] -> Cal c[100] -> LocalWindowAggregate[101], >> WindowTableFunction[425] -> Calc[426] -> LocalWindowAggregate[427]), >> Calc[106] -> (WindowTableFunction[107] -> Calc[108] -> >> LocalWindowAggregate[109], WindowTableFunction[432] -> Calc[433] -> >> LocalWindowAggregate[434]), Calc[114] -> (LocalWindowAggregate[115], >> LocalWindowAggregate[439]), Calc[120] -> LocalWindowAggregate[121], >> Calc[126] -> LocalWindowAggregate[127], Calc[132] -> >> LocalWindowAggregate[133], Calc[138] -> LocalWindowAggregate[139], >> Calc[144] -> (LocalWindowAggregate[145], LocalWindowAggregate[452]), >> Calc[150] -> LocalWindowAggregate[151], Calc[156] -> >> LocalWindowAggregate[157], Calc[162] -> LocalWindowAggregate[163], >> Calc[168] -> LocalWindowAggregate[169], Calc[174] -> >> LocalWindowAggregate[175], Calc[180] -> LocalWindowAggregate[181], >> Calc[186] -> LocalWindowAggregate[187], Calc[192] -> >> LocalWindowAggregate[193], Calc[198] -> LocalWindowAggregate[199], C >> alc[204] -> (LocalWindowAggregate[205], LocalWindowAggregate[475]), >> Calc[210] -> (LocalWindowAggregate[211], LocalWindowAggregate[480]), >> Calc[216] -> LocalWindowAggregate[217], Calc[222] -> >> LocalWindowAggregate[223], Calc[228] -> LocalWindowAggregate[229], >> Calc[234] -> WindowTableFunction[235] -> Calc[236] -> >> LocalWindowAggregate[237], Calc[242] -> LocalWindowAggregate[243], >> Calc[248] -> LocalWindowAggregate[249], Calc[254] -> >> LocalWindowAggregate[255], Calc[260] -> LocalWindowAggregate[261], >> Calc[266] -> LocalWindowAggregate[267], Calc[272] -> >> LocalWindowAggregate[273], Calc[278] -> WindowTableFunction[279] -> >> Calc[280] -> LocalWindowAggregate[281], Calc[290] -> LocalWindowAggr >> egate[291], Calc[296] -> LocalWindowAggregate[297], Calc[302] -> >> LocalWindowAggregate[303], Calc[308] -> LocalWindowAggregate[309], >> Calc[314] -> LocalWindowAggregate[315], Calc[320] -> >> LocalWindowAggregate[321], Calc[326] -> Wind owTableFunction[327] -> >> Calc[328] -> LocalWindowAggregate[329], Calc[338] -> >> LocalWindowAggregate[339], Calc[348] -> LocalWindowAggregate[349], >> Calc[354] -> LocalWindowAggregate[355], Calc[360] -> >> LocalWindowAggregate[361]), Mini BatchAssigner[366] -> (Calc[367] -> >> Alert[368]: Writer -> Alert[368]: Committer, Calc[369] -> Alert[370]: >> Writer -> Alert[370]: Committer, Calc[371] -> Alert[372]: Writer -> >> Alert[372]: Committer, Calc[373] -> Alert[374]: Writer -> >> Alert[374]: Committer, Calc[375] -> Alert[376]: Writer -> Alert[376]: >> Committer, Calc[377] -> Alert[378]: Writer -> Alert[378]: Committer, >> Calc[379] -> Alert[380]: Writer -> Alert[380]: Committer)) (1/20)#0 >> (335c0e92a4914fc728 >> 9f6e977222b76b_e3dfc0d7e9ecd8a43f85f0b68ebf3b80_0_0) switched from >> RUNNING to FAILED with failure cause: >> >> 97274 java.util.concurrent.TimeoutException: Invocation of >> [RemoteRpcInvocation(JobMasterOperatorEventGateway.sendOperatorEventToCoordinator(ExecutionAttemptID, >> OperatorID, SerializedValue))] at recipient [akka.tcp://flink@localhost:61 >> 23/user/rpc/jobmanager_3] timed out. This is usually caused by: 1) Akka >> failed sending the message silently, due to problems like oversized payload >> or serialization failures. In that case, you should find detailed error >> informati on in the logs. 2) The recipient needs more time for >> responding, due to problems like slow machines or network jitters. In that >> case, you can try to increase akka.ask.timeout. >> >> 97275 at >> com.sun.proxy.$Proxy25.sendOperatorEventToCoordinator(Unknown Source) ~[?:?] >> >> 97276 at >> org.apache.flink.runtime.taskexecutor.rpc.RpcTaskOperatorEventGateway.sendOperatorEventToCoordinator(RpcTaskOperatorEventGateway.java:60) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> 97277 at >> org.apache.flink.streaming.runtime.tasks.OperatorEventDispatcherImpl$OperatorEventGatewayImpl.sendEventToCoordinator(OperatorEventDispatcherImpl.java:120) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> 97278 at >> org.apache.flink.streaming.runtime.tasks.OperatorChain.lambda$sendAcknowledgeCheckpointEvent$1(OperatorChain.java:947) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> 97279 at java.util.HashMap$KeySet.forEach(HashMap.java:929) >> ~[?:?] >> >> 97280 at >> org.apache.flink.streaming.runtime.tasks.OperatorChain.sendAcknowledgeCheckpointEvent(OperatorChain.java:943) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> 97281 at >> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.snapshotState(RegularOperatorChain.java:201) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> 97282 at >> org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.takeSnapshotSync(SubtaskCheckpointCoordinatorImpl.java:715) >> ~[flink-dist-1.17.1.jar:1.17.1] >> >> >> >> >> >> Thanks, >> >> Neha Rawat >> >> Caution: External email. Do not click or open attachments unless you know >> and trust the sender. >> >> Caution: External email. Do not click or open attachments unless you know >> and trust the sender. >> >