Hi Till, I find something may be helpful. The kubernetes Dashboard show job-manager ip 172.18.0.5, task-manager ip 172.18.0.6 When I run command 'kubectl exec -ti flink-taskmanager-74c68c6f48-jqpbn -- /bin/bash’ && ‘ping 172.18.0.5’ I can get response But when I ping flink-jobmanager ,there is no response
| | superainbower | | superainbo...@163.com | 签名由网易邮箱大师定制 On 09/3/2020 09:03,superainbower<superainbo...@163.com> wrote: Hi Till, This is the taskManager log As you see, the logs print ‘line 92 -- Could not connect to flink-jobmanager:6123’ then print ‘line 128 --Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*.’ And repeat print this A few minutes later, the taskmanger shut down and restart This is my yaml files, could u help me to confirm did I omitted something? Thanks a lot! --------------------------------------------------- flink-configuration-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: flink-config labels: app: flink data: flink-conf.yaml: |+ jobmanager.rpc.address: flink-jobmanager taskmanager.numberOfTaskSlots: 1 blob.server.port: 6124 jobmanager.rpc.port: 6123 taskmanager.rpc.port: 6122 queryable-state.proxy.ports: 6125 jobmanager.memory.process.size: 1024m taskmanager.memory.process.size: 1024m parallelism.default: 1 log4j-console.properties: |+ rootLogger.level = INFO rootLogger.appenderRef.console.ref = ConsoleAppender rootLogger.appenderRef.rolling.ref = RollingFileAppender logger.akka.name = akka logger.akka.level = INFO logger.kafka.name= org.apache.kafka logger.kafka.level = INFO logger.hadoop.name = org.apache.hadoop logger.hadoop.level = INFO logger.zookeeper.name = org.apache.zookeeper logger.zookeeper.level = INFO appender.console.name = ConsoleAppender appender.console.type = CONSOLE appender.console.layout.type = PatternLayout appender.console.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n appender.rolling.name = RollingFileAppender appender.rolling.type = RollingFile appender.rolling.append = false appender.rolling.fileName = ${sys:log.file} appender.rolling.filePattern = ${sys:log.file}.%i appender.rolling.layout.type = PatternLayout appender.rolling.layout.pattern = %d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n appender.rolling.policies.type = Policies appender.rolling.policies.size.type = SizeBasedTriggeringPolicy appender.rolling.policies.size.size=100MB appender.rolling.strategy.type = DefaultRolloverStrategy appender.rolling.strategy.max = 10 logger.netty.name = org.apache.flink.shaded.akka.org.jboss.netty.channel.DefaultChannelPipeline logger.netty.level = OFF --------------------------------------------------- jobmanager-service.yaml apiVersion: v1 kind: Service metadata: name: flink-jobmanager spec: type: ClusterIP ports: - name: rpc port: 6123 - name: blob-server port: 6124 - name: webui port: 8081 selector: app: flink component: jobmanager -------------------------------------------------- jobmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-jobmanager spec: replicas: 1 selector: matchLabels: app: flink component: jobmanager template: metadata: labels: app: flink component: jobmanager spec: containers: - name: jobmanager image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 args: ["jobmanager"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob-server - containerPort: 8081 name: webui livenessProbe: tcpSocket: port: 6123 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties imagePullSecrets: - name: regcred --------------------------------------------------- taskmanager-session-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: flink-taskmanager spec: replicas: 1 selector: matchLabels: app: flink component: taskmanager template: metadata: labels: app: flink component: taskmanager spec: containers: - name: taskmanager image: registry.cn-hangzhou.aliyuncs.com/superainbower/flink:1.11.1 args: ["taskmanager"] ports: - containerPort: 6122 name: rpc - containerPort: 6125 name: query-state livenessProbe: tcpSocket: port: 6122 initialDelaySeconds: 30 periodSeconds: 60 volumeMounts: - name: flink-config-volume mountPath: /opt/flink/conf/ securityContext: runAsUser: 9999 # refers to user _flink_ from official flink image, change if necessary volumes: - name: flink-config-volume configMap: name: flink-config items: - key: flink-conf.yaml path: flink-conf.yaml - key: log4j-console.properties path: log4j-console.properties imagePullSecrets: - name: regcred | | superainbower | | superainbo...@163.com | 签名由网易邮箱大师定制 On 09/2/2020 20:38,Till Rohrmann<trohrm...@apache.org> wrote: Hmm, this is indeed strange. Could you share the logs of the TaskManager with us? Ideally you set the log level to debug. Thanks a lot. Cheers, Till On Wed, Sep 2, 2020 at 12:45 PM art <superainbo...@163.com> wrote: Hi Till, The full information when I run command ' kubectl get all’ like this: NAME READY STATUS RESTARTS AGE pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 2m34s pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 2m34s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/flink-jobmanager ClusterIP 10.103.207.75 <none> 6123/TCP,6124/TCP,8081/TCP 2m34s service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d2h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/flink-jobmanager 1/1 1 1 2m34s deployment.apps/flink-taskmanager 1/1 1 1 2m34s NAME DESIRED CURRENT READY AGE replicaset.apps/flink-jobmanager-85bdbd98d8 1 1 1 2m34s replicaset.apps/flink-taskmanager-74c68c6f48 1 1 1 2m34s And I can open flink ui but the task manger is 0 ,so the job manger is work well I think the problem is taksmanger can not register itself to jobmanger, did I miss some configure? 在 2020年9月2日,下午5:24,Till Rohrmann <trohrm...@apache.org> 写道: Hi art, could you check what `kubectl get services` returns? Usually if you run `kubectl get all` you should also see the services. But in your case there are no services listed. You have see something like service/flink-jobmanager otherwise the flink-jobmanager service (K8s service) is not running. Cheers, Till On Wed, Sep 2, 2020 at 11:15 AM art <superainbo...@163.com> wrote: Hi Till, I’m sure the job manager-service is started, I can find it in Kubernetes DashBoard When I run command ' kubectl get deployment’ I can got this: flink-jobmanager 1/1 1 1 33s flink-taskmanager 1/1 1 1 33s When I run command ' kubectl get all’ I can got this: NAME READY STATUS RESTARTS AGE pod/flink-jobmanager-85bdbd98d8-ppjmf 1/1 Running 0 2m34s pod/flink-taskmanager-74c68c6f48-6jb5v 1/1 Running 0 2m34s So, I think flink-jobmanager works well, but taskmannger is restarted every few minutes My minikube version: v1.12.3 Flink version:v1.11.1 在 2020年9月2日,下午4:27,Till Rohrmann <trohrm...@apache.org> 写道: Hi art, could you verify that the jobmanager-service has been started? It looks as if the name flink-jobmanager is not resolvable. It could also help to know the Minikube and K8s version you are using. Cheers, Till On Wed, Sep 2, 2020 at 9:50 AM art <superainbo...@163.com> wrote: Hi,I’m going to deploy flink on minikube referring to https://ci.apache.org/projects/flink/flink-docs-release-1.11/zh/ops/deployment/kubernetes.html; kubectl create -f flink-configuration-configmap.yaml kubectl create -f jobmanager-service.yaml kubectl create -f jobmanager-session-deployment.yaml kubectl create -f taskmanager-session-deployment.yaml But I got this 2020-09-02 06:45:42,664 WARN akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.UnknownHostException: flink-jobmanager: Temporary failure in name resolution] 2020-09-02 06:45:42,691 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:02,731 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor [] - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/rpc/resourcemanager_*. 2020-09-02 06:46:12,731 INFO akka.remote.transport.ProtocolStateActor [] - No response from remote for outbound association. Associate timed out after [20000 ms]. And when I run the command 'kubectl exec -ti flink-taskmanager-74c68c6f48-9tkvd -- /bin/bash’ && ‘ping flink-jobmanager’ , I find I cannot ping flink-jobmanager from taskmanager I am new to k8s, can anyone give me some tutorial? Thanks a lot !