Re: Task-manager kubernetes pods take a long time to terminate

Yang Wang Thu, 30 Jan 2020 18:22:17 -0800

I think if you want to delete your Flink cluster on K8s, then you need to
directly delete all the
created deployments(jobmanager deploy, taskmanager deploy). For the
configmap and service,
you could leave them there if you want to reuse them by the next Flink
cluster deploy.


What's the status of taskmanager pod when you delete it and get stuck?


Best,
Yang

Li Peng <li.p...@doordash.com> 于2020年1月31日周五 上午4:51写道：

> Hi Yun,
>
> I'm currently specifying that specific RPC address in my kubernetes charts
> for conveniene, should I be generating a new one for every deployment?
>
> And yes, I am deleting the pods using those commands, I'm just noticing
> that the task-manager termination process is short circuited by the
> registration timeout check, so that instead of terminating quickly, the
> task-manger would wait for 5 minutes to timeout before terminating. I'm
> expecting it to just terminate without doing that registration timeout, is
> there a way to configure that?
>
> Thanks,
> Li
>
>
> On Thu, Jan 30, 2020 at 8:53 AM Yun Tang <myas...@live.com> wrote:
>
>> Hi Li
>>
>> Why you still use ’job-manager' as thejobmanager.rpc.address for the
>> second new cluster? If you use another rpc address, previous task managers
>> would not try to register with old one.
>>
>> Take flink documentation [1] for k8s as example. You can list/delete all
>> pods like:
>>
>> kubectl get/delete pods -l app=flink
>>
>>
>> By the way, the default registration timeout is 5min [2], those
>> taskmanager could not register to the JM will suicide after 5 minutes.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html#session-cluster-resource-definitions
>> [2]
>> https://github.com/apache/flink/blob/7e1a0f446e018681cb537dd936ae54388b5a7523/flink-core/src/main/java/org/apache/flink/configuration/TaskManagerOptions.java#L158
>>
>> Best
>> Yun Tang
>>
>> ------------------------------
>> *From:* Li Peng <li.p...@doordash.com>
>> *Sent:* Thursday, January 30, 2020 9:24
>> *To:* user <user@flink.apache.org>
>> *Subject:* Task-manager kubernetes pods take a long time to terminate
>>
>> Hey folks, I'm deploying a Flink cluster via kubernetes, and starting
>> each task manager with taskmanager.sh. I noticed that when I tell kubectl
>> to delete the deployment, the job-manager pod usually terminates very
>> quickly, but any task-manager that doesn't get terminated before the
>> job-manager, usually gets stuck in this loop:
>>
>> 2020-01-29 09:18:47,867 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not
>> resolve ResourceManager address 
>> akka.tcp://flink@job-manager:6123/user/resourcemanager,
>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>> akka.tcp://flink@job-manager:6123/user/resourcemanager
>>
>> It then does this for about 10 minutes(?), and then shuts down. If I'm
>> deploying a new cluster, this pod will try to register itself with the new
>> job manager before terminating lter. This isn't a troubling issue as far as
>> I can tell, but I find it annoying that I sometimes have to force delete
>> the pods.
>>
>> Any easy ways to just have the task managers terminate gracefully and
>> quickly?
>>
>> Thanks,
>> Li
>>
>

Re: Task-manager kubernetes pods take a long time to terminate

Reply via email to