I guess something is wrong with your kube proxy, which causes TaskManager
could not connect to JobManager.
You could verify this by directly using JobManager Pod ip instead of
service name.

Please do as follows.
* Edit the TaskManager deployment(via kubectl edit flink-taskmanager) and
update the args field to the following.
   args: ["taskmanager", "-Djobmanager.rpc.address=172.18.0.5"]    Given
that "172.18.0.5" is the JobManager pod ip.
* Delete the current TaskManager pod and let restart again
* Now check the TaskManager logs to check whether it could register
successfully



Best,
Yang

superainbower <superainbo...@163.com> 于2020年9月3日周四 上午9:35写道:

> Hi Till,
> I find something may be helpful.
> The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip
> 172.18.0.6
> When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn --
> /bin/bash’ && ‘ping 172.18.0.5’
> I can get response
> But when I ping flink-jobmanager ,there is no response
>
> superainbower
> superainbo...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/3/2020 09:03,superainbower<superainbo...@163.com>
> <superainbo...@163.com> wrote:
>
> Hi Till,
> This is the taskManager log
> As you see, the logs print  ‘line 92 -- Could not connect to
> flink-jobmanager:6123’
> then print ‘line 128 --Could not resolve ResourceManager address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
> retrying in 10000 ms: Could not connect to rpc endpoint under address
> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’   And
> repeat print this
>
> A few minutes later, the taskmanger shut down and restart
>
> This is my yaml files, could u help me to confirm did I omitted something?
> Thanks a lot!
> ---------------------------------------------------
> flink-configuration-configmap.yaml
> apiVersion: v1
> kind: ConfigMap
> metadata:
>   name: flink-config
>   labels:
>     app: flink
> data:
>   flink-conf.yaml: |+
>     jobmanager.rpc.address: flink-jobmanager
>     taskmanager.numberOfTaskSlots: 1
>     blob.server.port: 6124
>     jobmanager.rpc.port: 6123
>     taskmanager.rpc.port: 6122
>     queryable-state.proxy.ports: 6125
>     jobmanager.memory.process.size: 1024m
>     taskmanager.memory.process.size: 1024m
>     parallelism.default: 1
>   log4j-console.properties: |+
>     rootLogger.level = INFO
>     rootLogger.appenderRef.console.ref = ConsoleAppender
>     rootLogger.appenderRef.rolling.ref = RollingFileAppender
>     logger.akka.name = akka
>     logger.akka.level = INFO
>     logger.kafka.name= org.apache.kafka
>     logger.kafka.level = INFO
>     logger.hadoop.name = org.apache.hadoop
>     logger.hadoop.level = INFO
>     logger.zookeeper.name = org.apache.zookeeper
>     logger.zookeeper.level = INFO
>     appender.console.name = ConsoleAppender
>     appender.console.type = CONSOLE
>     appender.console.layout.type = PatternLayout
>     appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
> %-60c %x - %m%n
>     appender.rolling.name = RollingFileAppender
>     appender.rolling.type = RollingFile
>     appender.rolling.append = false
>     appender.rolling.fileName = ${sys:log.file}
>     appender.rolling.filePattern = ${sys:log.file}.%i
>     appender.rolling.layout.type = PatternLayout
>     appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p
> %-60c %x - %m%n
>     appender.rolling.policies.type = Policies
>     appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
>     appender.rolling.policies.size.size=100MB
>     appender.rolling.strategy.type = DefaultRolloverStrategy
>     appender.rolling.strategy.max = 10
>     logger.netty.name =
> org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline
>     logger.netty.level = OFF
> ---------------------------------------------------
> jobmanager-service.yaml
> apiVersion: v1
> kind: Service
> metadata:
>   name: flink-jobmanager
> spec:
>   type: ClusterIP
>   ports:
>   - name: rpc
>     port: 6123
>   - name: blob-server
>     port: 6124
>   - name: webui
>     port: 8081
>   selector:
>     app: flink
>     component: jobmanager
> --------------------------------------------------
> jobmanager-session-deployment.yaml
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: flink-jobmanager
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: flink
>       component: jobmanager
>   template:
>     metadata:
>       labels:
>         app: flink
>         component: jobmanager
>     spec:
>       containers:
>       - name: jobmanager
>         image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>         args: ["jobmanager"]
>         ports:
>         - containerPort: 6123
>           name: rpc
>         - containerPort: 6124
>           name: blob-server
>         - containerPort: 8081
>           name: webui
>         livenessProbe:
>           tcpSocket:
>             port: 6123
>           initialDelaySeconds: 30
>           periodSeconds: 60
>         volumeMounts:
>         - name: flink-config-volume
>           mountPath: /opt/flink/conf
>         securityContext:
>           runAsUser: 9999  # refers to user _flink_ from official flink
> image, change if necessary
>       volumes:
>       - name: flink-config-volume
>         configMap:
>           name: flink-config
>           items:
>           - key: flink-conf.yaml
>             path: flink-conf.yaml
>           - key: log4j-console.properties
>             path: log4j-console.properties
>       imagePullSecrets:
>         - name: regcred
> ---------------------------------------------------
> taskmanager-session-deployment.yaml
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   name: flink-taskmanager
> spec:
>   replicas: 1
>   selector:
>     matchLabels:
>       app: flink
>       component: taskmanager
>   template:
>     metadata:
>       labels:
>         app: flink
>         component: taskmanager
>     spec:
>       containers:
>       - name: taskmanager
>         image:
> registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1
>         args: ["taskmanager"]
>         ports:
>         - containerPort: 6122
>           name: rpc
>         - containerPort: 6125
>           name: query-state
>         livenessProbe:
>           tcpSocket:
>             port: 6122
>           initialDelaySeconds: 30
>           periodSeconds: 60
>         volumeMounts:
>         - name: flink-config-volume
>           mountPath: /opt/flink/conf/
>         securityContext:
>           runAsUser: 9999  # refers to user _flink_ from official flink
> image, change if necessary
>       volumes:
>       - name: flink-config-volume
>         configMap:
>           name: flink-config
>           items:
>           - key: flink-conf.yaml
>             path: flink-conf.yaml
>           - key: log4j-console.properties
>             path: log4j-console.properties
>       imagePullSecrets:
>         - name: regcred
>
>
> superainbower
> superainbo...@163.com
>
> <https://maas.mail.163.com/dashi-web-extend/html/proSignature.html?ftlId=1&name=superainbower&uid=superainbower%40163.com&iconUrl=https%3A%2F%2Fmail-online.nosdn.127.net%2Fqiyelogo%2FdefaultAvatar.png&items=%5B%22superainbower%40163.com%22%5D>
> 签名由 网易邮箱大师 <https://mail.163.com/dashi/dlpro.html?from=mail81> 定制
>
> On 09/2/2020 20:38,Till Rohrmann<trohrm...@apache.org>
> <trohrm...@apache.org> wrote:
>
> Hmm, this is indeed strange. Could you share the logs of the TaskManager
> with us? Ideally you set the log level to debug. Thanks a lot.
>
> Cheers,
> Till
>
> On Wed, Sep 2, 2020 at 12:45 PM art <superainbo...@163.com> wrote:
>
>> Hi Till,
>>
>> The full information when I run command ' kubectl get all’  like this:
>>
>> NAME                                     READY   STATUS    RESTARTS   AGE
>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>  2m34s
>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>  2m34s
>>
>> NAME                       TYPE        CLUSTER-IP      EXTERNAL-IP
>> PORT(S)                      AGE
>> service/flink-jobmanager   ClusterIP   10.103.207.75   <none>
>>  6123/TCP,6124/TCP,8081/TCP   2m34s
>> service/kubernetes         ClusterIP   10.96.0.1       <none>
>>  443/TCP                      5d2h
>>
>> NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
>> deployment.apps/flink-jobmanager    1/1     1            1           2m34s
>> deployment.apps/flink-taskmanager   1/1     1            1           2m34s
>>
>> NAME                                           DESIRED   CURRENT   READY
>>   AGE
>> replicaset.apps/flink-jobmanager-85bdbd98d8    1         1         1
>>   2m34s
>> replicaset.apps/flink-taskmanager-74c68c6f48   1         1         1
>>   2m34s
>>
>> And I can open flink ui but the task manger is 0 ,so the job manger is
>> work well
>> I think the problem is taksmanger can not register itself to jobmanger,
>>  did I miss some configure?
>>
>>
>> 在 2020年9月2日,下午5:24,Till Rohrmann <trohrm...@apache.org> 写道:
>>
>> Hi art,
>>
>> could you check what `kubectl get services` returns? Usually if you run
>> `kubectl get all` you should also see the services. But in your case there
>> are no services listed. You have see something like
>> service/flink-jobmanager otherwise the flink-jobmanager service (K8s
>> service) is not running.
>>
>> Cheers,
>> Till
>>
>> On Wed, Sep 2, 2020 at 11:15 AM art <superainbo...@163.com> wrote:
>>
>>> Hi Till,
>>>
>>> I’m sure the job manager-service is started, I can find it in Kubernetes
>>> DashBoard
>>>
>>> When I run command ' kubectl get deployment’ I can got this:
>>> flink-jobmanager    1/1     1            1           33s
>>> flink-taskmanager   1/1     1            1           33s
>>>
>>> When I run command ' kubectl get all’ I can got this:
>>> NAME                                     READY   STATUS    RESTARTS   AGE
>>> pod/flink-jobmanager-85bdbd98d8-ppjmf    1/1     Running   0
>>>  2m34s
>>> pod/flink-taskmanager-74c68c6f48-6jb5v   1/1     Running   0
>>>  2m34s
>>>
>>> So, I think flink-jobmanager works well, but taskmannger is restarted
>>> every few minutes
>>>
>>> My minikube version: v1.12.3
>>> Flink version:v1.11.1
>>>
>>> 在 2020年9月2日,下午4:27,Till Rohrmann <trohrm...@apache.org> 写道:
>>>
>>> Hi art,
>>>
>>> could you verify that the jobmanager-service has been started? It looks
>>> as if the name flink-jobmanager is not resolvable. It could also help to
>>> know the Minikube and K8s version you are using.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Wed, Sep 2, 2020 at 9:50 AM art <superainbo...@163.com> wrote:
>>>
>>>> Hi,I’m going to deploy flink on minikube referring to
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html
>>>> ;
>>>> kubectl create -f flink-configuration-configmap.yaml
>>>> kubectl create -f jobmanager-service.yaml
>>>> kubectl create -f jobmanager-session-deployment.yaml
>>>> kubectl create -f taskmanager-session-deployment.yaml
>>>>
>>>> But I got this
>>>>
>>>> 2020-09-02 06:45:42,664 WARN  akka.remote.ReliableDeliverySupervisor
>>>>                     [] - Association with remote system [
>>>> akka.tcp://flink@flink-jobmanager:6123] has failed, address is now
>>>> gated for [50] ms. Reason: [Association failed with [
>>>> akka.tcp://flink@flink-jobmanager:6123]] Caused by:
>>>> [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name
>>>> resolution]
>>>> 2020-09-02 06:45:42,691 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>> not resolve ResourceManager address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>> 2020-09-02 06:46:02,731 INFO
>>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor           [] - Could
>>>> not resolve ResourceManager address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*,
>>>> retrying in 10000 ms: Could not connect to rpc endpoint under address
>>>> akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.
>>>> 2020-09-02 06:46:12,731 INFO  akka.remote.transport.ProtocolStateActor
>>>>                     [] - No response from remote for outbound association.
>>>> Associate timed out after [20000 ms].
>>>>
>>>> And when I run the command 'kubectl exec -ti
>>>> flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’
>>>> , I find I cannot ping flink-jobmanager from taskmanager
>>>>
>>>> I am new to k8s, can anyone give me some tutorial? Thanks a lot !
>>>>
>>>
>>>
>>

Reply via email to