Hi Tino, 从org.apache.flink.core.fs.FileSystem.java <https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java#L361-L371> 来看,Flink直接将fs.default-scheme当作URI来解析,并没有解析相关xml配置的操作,看起来Flink目前是不支持HA架构的HDFS集群的。
Best, Yanfei Xuyang <xyzhong...@163.com> 于2022年9月21日周三 23:28写道: > Hi,我对HA的HDFS部署不是很熟悉,但是看错误栈是由于无法识别hostname引起的: > Caused by: java.lang.IllegalArgumentException: > java.net.UnknownHostException: datacluster > 我猜测是不是可以修改为以下两种之一: > 1. hdfs://datacluster: port (类似hdfs://datacluster:8080) > > 2. hdfs:///datacluster (三个斜杠) > > > > > 希望可以帮到你 > > -- > > Best! > Xuyang > > > > > > 在 2022-09-21 18:24:46,"Tino Hean" <tinoh...@gmail.com> 写道: > >*大家好, * > >*我正在测试在k8s集群部署模式下使用HA架构的HDFS集群, 以下是我的提交命令参数* > >./bin/flink run-application \ > > --detached \ > > --target kubernetes-application \ > > -Dkubernetes.cluster-id=test \ > > -Dkubernetes.container.image=flink-java11 \ > > -Dfs.default-scheme=hdfs://datacluster \ > > -Dkubernetes.rest-service.exposed.type=LoadBalancer \ > > > > >-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory > >\ > > -Dhigh-availability.storageDir=hdfs://datacluster/flink/recovery \ > > -Dkubernetes.namespace=flink \ > > -Dkubernetes.service-account=flink-sa \ > > -Denv.hadoop.conf.dir=/opt/flink/conf \ > > -Dkubernetes.container.image.pull-policy=Always \ > > local:///opt/flink/usrlib/test.jar > > > >*我已经复制了core-site.xml 和hdfs-site.xml到$FLINK_HOME/conf下, 目录结构如下* > >flink@e3187a41a139:~$ ls conf > >core-site.xml hdfs-site.xml log4j-console.properties > >log4j-session.properties logback-session.xml masters zoo.cfg > >flink-conf.yaml log4j-cli.properties log4j.properties logback-console.xml > >logback.xml workers > > > >*但是遇到了下面的报错:* > > > >2022-09-21 10:17:40,156 ERROR > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Could > not > >start cluster entrypoint KubernetesApplicationClusterEntrypoint. > >org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to > >initialize the cluster entrypoint KubernetesApplicationClusterEntrypoint. > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:250) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:711) > >[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86) > >[flink-dist-1.15.2.jar:1.15.2] > >Caused by: org.apache.flink.util.FlinkException: Could not create the ha > >services from the instantiated HighAvailabilityServicesFactory > >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory. > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:287) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227) > >~[flink-dist-1.15.2.jar:1.15.2] > > at java.security.AccessController.doPrivileged(Native Method) > ~[?:?] > > at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?] > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224) > >~[flink-dist-1.15.2.jar:1.15.2] > > ... 2 more > >Caused by: java.io.IOException: Could not create FileSystem for highly > >available storage path (hdfs://datacluster/flink/recovery/cruiser) > > at > > >org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:102) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227) > >~[flink-dist-1.15.2.jar:1.15.2] > > at java.security.AccessController.doPrivileged(Native Method) > ~[?:?] > > at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?] > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224) > >~[flink-dist-1.15.2.jar:1.15.2] > > ... 2 more > >Caused by: java.io.IOException: Cannot instantiate file system for URI: > >hdfs://datacluster/flink/recovery/cruiser > > at > > >org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:196) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227) > >~[flink-dist-1.15.2.jar:1.15.2] > > at java.security.AccessController.doPrivileged(Native Method) > ~[?:?] > > at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?] > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224) > >~[flink-dist-1.15.2.jar:1.15.2] > > ... 2 more > >Caused by: java.lang.IllegalArgumentException: > >java.net.UnknownHostException: datacluster > > at > > >org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:140) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:358) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:292) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:200) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:185) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:168) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227) > >~[flink-dist-1.15.2.jar:1.15.2] > > at java.security.AccessController.doPrivileged(Native Method) > ~[?:?] > > at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?] > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224) > >~[flink-dist-1.15.2.jar:1.15.2] > > ... 2 more > >Caused by: java.net.UnknownHostException: datacluster > > at > > >org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:140) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:358) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:292) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:200) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:185) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:168) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409) > >~[flink-dist-1.15.2.jar:1.15.2] > > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227) > >~[flink-dist-1.15.2.jar:1.15.2] > > at java.security.AccessController.doPrivileged(Native Method) > ~[?:?] > > at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?] > > at > > >org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) > > >~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0] > > at > > >org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > >~[flink-dist-1.15.2.jar:1.15.2] > > at > > >org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224) > >~[flink-dist-1.15.2.jar:1.15.2] > > ... 2 more > > > >*我尝试把以下提交的格式更换为active的namenode节点可以正常提交并运行,* > > -Dfs.default-scheme=hdfs://namenode-active:8020 \ > > > >-Dhigh-availability.storageDir=hdfs://namenode-active:8020/flink/recovery > \ > > > >所以还无法明白我是哪里搞错了, 是不是flink不支持HA架构的hdfs集群, 望解惑, 非常感谢! > > > >*补充信息:* > >hadoop版本: 3.3.4 > >flink版本: 1.15.2 > > > >*hadoop hdfs-site.xml相关配置如下* > > > ><property> > > <name>dfs.nameservices</name> > > <value>datacluster</value> > ></property> > ><property> > > <name>dfs.ha.namenodes.datacluster</name> > > <value>nn1,nn2</value> > ></property> > ><property> > > <name>dfs.namenode.rpc-address.datacluster.nn1</name> > > <value>namenode-active:8020</value> > ></property> > ><property> > > <name>dfs.namenode.rpc-address.datacluster.nn2</name> > > <value>namenode-standby:8020</value> > ></property> > ><property> > > <name>dfs.namenode.http-address.datacluster.nn1</name> > > <value>namenode-active:9870</value> > ></property> > ><property> > > <name>dfs.namenode.http-address.datacluster.nn2</name> > > <value>namenode-standby:9870</value> > ></property> >