RocksDB StateBuilder unexpected exception

2021-03-19 Thread dhanesh arole
cutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]* - Dhanesh Arole

Re: RocksDB StateBuilder unexpected exception

2021-03-23 Thread dhanesh arole
Hi Matthias, Thanks for taking to help us with this. You are right there were lots of task cancellations, as this exception causes the job to get restarted, triggering cancellations. - Dhanesh Arole On Tue, Mar 23, 2021 at 9:27 AM Matthias Pohl wrote: > Hi Danesh, > thanks for reachi

Task manager local state data after crash / recovery

2021-04-06 Thread dhanesh arole
ow are you handling these operational tasks currently? configurations: state.backend.local-recovery: true taskmanager.state.local.root-dirs: /data/flink/ RocksDb backend DB storage path: /data/flink ( set programmatically ) - Dhanesh Arole

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-06 Thread dhanesh arole
data kicks in when the task manager process is alive but due to some other reason ( like timeout from sink or external dependency ) one of the tasks fails and the flink job gets restarted by the job manager. Please CMIIW - Dhanesh Arole On Tue, Apr 6, 2021 at 11:35 AM Till Rohrmann wrote:

Re: Flink Taskmanager failure recovery and large state

2021-04-07 Thread dhanesh arole
problem, we increased *akka.ask.timeout *to 10m. This gives enough room to task managers to wait for rpc responses from other task managers during restart. As a result TM becomes more lenient in marking other TM as failed and cancelling the job in the first place. - Dhanesh Arole On Tue, Apr 6, 2

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-07 Thread dhanesh arole
them also in case of a global failover. Only > those tasks which have been executed on the lost TaskManager will need new > slots and have to download the state from the remote storage. > > Cheers, > Till > > On Tue, Apr 6, 2021 at 5:35 PM dhanesh arole > wrote: > >

Re: Task manager local state data after crash / recovery

2021-04-09 Thread dhanesh arole
in case of a hard process stop. Cleaning this state up is at > the moment unfortunately the responsibility of the user. > > Cheers, > Till > > On Tue, Apr 6, 2021 at 4:55 PM dhanesh arole > wrote: > >> Hey all, >> >> We are running a stateful stream processing job o

Flink cluster on k8s with rocksdb state backend

2019-10-17 Thread dhanesh arole
Hello all, I am trying to provision a Flink cluster on k8s. Some of the jobs in our existing cluster use RocksDB state backend. I wanted to take a look at the Flink helm chart or deployment manifests that provision task managers with dynamic PV and how they manage it. We are running on kops manage

Re: Flink grpc-netty-shaded NoClassDefFoundError

2019-10-22 Thread dhanesh arole
Just to give you more context, We are using `com.google.cloud.bigtable` as well in our job dep. Could it be due to shaded plugin issue with `bigtable-hbase-2.x` ? - Dhanesh Arole ( Sent from mobile device. Pardon me for typos ) On Tue, Oct 22, 2019 at 2:06 PM dhanesh arole wrote: > He

Flink Savepoint fault tolerance

2021-04-16 Thread dhanesh arole
st be already handled but just wanted to confirm and get help in finding relevant code references for this so I can dig deeper for understanding it in depth from an educational point of view. - Dhanesh Arole ( Sent from mobile device. Pardon me for typos )

Re: Flink Savepoint fault tolerance

2021-04-21 Thread dhanesh arole
ase of job restarts or TM failures? >> > Savepoints have to be triggered anew. Savepoints are meant as a purely > manual feature. Again, you could automate it, if you look at the logs. > > Best, > > Arvid > > > On Fri, Apr 16, 2021 at 12:33 PM dhanesh arole

Re: Flink streaming job's taskmanager process killed by yarn nodemanager because of exceeding 'PHYSICAL' memory limit

2021-04-22 Thread dhanesh arole
o jemalloc, the memory foot print improved dramatically. - Dhanesh Arole ( Sent from mobile device. Pardon me for typos ) On Thu, Apr 22, 2021 at 1:39 PM Matthias Pohl wrote: > Hi, > I have a few questions about your case: > * What is the option you're referring to for the bound

Flink docker on k8s job submission timeout

2021-11-10 Thread dhanesh arole
] at akka.actor.LightArrayRevolverScheduler$$anon$3.executeBucket$1(LightArrayRevolverScheduler.scala:279) ~[flink.jar:?] at akka.actor.LightArrayRevolverScheduler$$anon$3.nextTick(LightArrayRevolverScheduler.scala:283) ~[flink.jar:?]* - Dhanesh Arole