如果你是用的ClusterIP的暴露方式,那任务提交只能在K8s内进行的 因为外部环境无法解析到K8s内部的service(也就是tuiwen-flink-rest.flink)
你可以在K8s集群内起一个Pod来充当Flink client,然后在Pod内进行任务提交 Best, Yang 吴松 <wus...@funstory.ai> 于2020年11月24日周二 下午4:23写道: > 不好意思,这个报错应该是内存的问题。 我想说的是一下的报错。 > > > > > > > 2020-11-24 16:19:33,569 ERROR > org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient [] - A > Kubernetes exception occurred. > java.net.UnknownHostException: tuiwen-flink-rest.flink: Name or service > not known > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > ~[?:1.8.0_252] > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > ~[?:1.8.0_252] > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_252] > at java.net.InetAddress.getByName(InetAddress.java:1077) > ~[?:1.8.0_252] > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:193) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:113) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:142) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:109) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:188) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > [flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:188) > [flink-dist_2.12-1.11.2.jar:1.11.2] > 2020-11-24 16:19:33,606 ERROR > org.apache.flink.kubernetes.cli.KubernetesSessionCli > [] - Error while running the Flink session. > java.lang.RuntimeException: > org.apache.flink.client.deployment.ClusterRetrieveException: Could not > create the RestClusterClient. > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:117) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:142) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:109) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:188) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:188) > [flink-dist_2.12-1.11.2.jar:1.11.2] > Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: > Could not create the RestClusterClient. > ... 6 more > Caused by: java.net.UnknownHostException: tuiwen-flink-rest.flink: Name or > service not known > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > ~[?:1.8.0_252] > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > ~[?:1.8.0_252] > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_252] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_252] > at java.net.InetAddress.getByName(InetAddress.java:1077) > ~[?:1.8.0_252] > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:193) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:113) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > ... 5 more > > > ------------------------------------------------------------ > The program finished with the following exception: > > > java.lang.RuntimeException: > org.apache.flink.client.deployment.ClusterRetrieveException: Could not > create the RestClusterClient. > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:117) > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.deploySessionCluster(KubernetesClusterDescriptor.java:142) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:109) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:188) > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:188) > Caused by: org.apache.flink.client.deployment.ClusterRetrieveException: > Could not create the RestClusterClient. > ... 6 more > Caused by: java.net.UnknownHostException: tuiwen-flink-rest.flink: Name or > service not known > at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at java.net.InetAddress.getByName(InetAddress.java:1077) > at > org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getWebMonitorAddress(HighAvailabilityServicesUtils.java:193) > at > org.apache.flink.kubernetes.KubernetesClusterDescriptor.lambda$createClusterClientProvider$0(KubernetesClusterDescriptor.java:113) > ... 5 more > > > > > > ------------------ Original ------------------ > From: "吴松"<wus...@funstory.ai>; > Date: Tue, Nov 24, 2020 03:51 PM > To: "user-zh"<user-zh@flink.apache.org>; > > Subject: flink on native k8s deploy issue > > > > > > 使用-Dkubernetes.rest-service.exposed.type=ClusterIP 配置是启动的flink报错: > > > 如下: > > > 2020-11-24 15:49:19,796 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > jobmanager.rpc.address, 0.0.0.0 > 2020-11-24 15:49:19,800 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > jobmanager.rpc.port, 6123 > 2020-11-24 15:49:19,801 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > jobmanager.memory.process.size, 1600m > 2020-11-24 15:49:19,801 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > taskmanager.memory.process.size, 1800m > 2020-11-24 15:49:19,801 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > taskmanager.numberOfTaskSlots, 1 > 2020-11-24 15:49:19,802 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > parallelism.default, 1 > 2020-11-24 15:49:19,802 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: high-availability, > zookeeper > 2020-11-24 15:49:19,803 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > high-availability.cluster-id, /tuiwen-flink > 2020-11-24 15:49:19,803 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > high-availability.storageDir, file:/usr/flink/tuiwen-flink > 2020-11-24 15:49:19,804 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > high-availability.zookeeper.quorum, > data-kafka-zookeeper-headless.tuiwen-public:2181 > 2020-11-24 15:49:19,804 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: state.backend, > rocksdb > 2020-11-24 15:49:19,805 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > state.checkpoints.dir, file:/usr/flink/flink-checkpoints > 2020-11-24 15:49:19,805 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > state.checkpoints.num-retained, 100 > 2020-11-24 15:49:19,805 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > state.savepoints.dir, file:/usr/flink/flink-savepoints > 2020-11-24 15:49:19,806 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: > jobmanager.execution.failover-strategy, region > 2020-11-24 15:49:19,806 INFO > org.apache.flink.configuration.GlobalConfiguration > [] - Loading configuration property: web.upload.dir, > /usr/flink > 2020-11-24 15:49:19,990 INFO > org.apache.flink.client.deployment.DefaultClusterClientServiceLoader [] - > Could not load factory due to missing dependencies. > 2020-11-24 15:49:22,366 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction jvm overhead memory (160.000mb (167772162 bytes)) is > less than its min value 192.000mb (201326592 bytes), min value will be used > instead > 2020-11-24 15:49:22,399 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction jvm overhead memory (70.000mb (73400321 bytes)) is > less than its min value 192.000mb (201326592 bytes), min value will be used > instead > 2020-11-24 15:49:22,401 INFO > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The > derived from fraction network memory (25.200mb (26424115 bytes)) is less > than its min value 64.000mb (67108864 bytes), min value will be used instead > 2020-11-24 15:49:22,405 ERROR > org.apache.flink.kubernetes.cli.KubernetesSessionCli > [] - Error while running the Flink session. > org.apache.flink.configuration.IllegalConfigurationException: Sum of > configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework > Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 > bytes), Managed Memory (100.800mb (105696462 bytes)) and Network Memory > (64.000mb (67108864 bytes)) exceed configured Total Flink Memory (252.000mb > (264241152 bytes)). > at > org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:136) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:105) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.client.deployment.AbstractContainerizedClusterClientFactory.getClusterSpecification(AbstractContainerizedClusterClientFactory.java:47) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:110) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:188) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > ~[flink-dist_2.12-1.11.2.jar:1.11.2] > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:188) > [flink-dist_2.12-1.11.2.jar:1.11.2] > > > ------------------------------------------------------------ > The program finished with the following exception: > > > org.apache.flink.configuration.IllegalConfigurationException: Sum of > configured Framework Heap Memory (128.000mb (134217728 bytes)), Framework > Off-Heap Memory (128.000mb (134217728 bytes)), Task Off-Heap Memory (0 > bytes), Managed Memory (100.800mb (105696462 bytes)) and Network Memory > (64.000mb (67108864 bytes)) exceed configured Total Flink Memory (252.000mb > (264241152 bytes)). > at > org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:136) > at > org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42) > at > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:105) > at > org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79) > at > org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109) > at > org.apache.flink.client.deployment.AbstractContainerizedClusterClientFactory.getClusterSpecification(AbstractContainerizedClusterClientFactory.java:47) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.run(KubernetesSessionCli.java:110) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.lambda$main$0(KubernetesSessionCli.java:188) > at > org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at > org.apache.flink.kubernetes.cli.KubernetesSessionCli.main(KubernetesSessionCli.java:188)