Hi,我对HA的HDFS部署不是很熟悉,但是看错误栈是由于无法识别hostname引起的:
Caused by: java.lang.IllegalArgumentException:
java.net.UnknownHostException: datacluster
我猜测是不是可以修改为以下两种之一:
1. hdfs://datacluster: port (类似hdfs://datacluster:8080)

2.  hdfs:///datacluster (三个斜杠)




希望可以帮到你

--

    Best!
    Xuyang





在 2022-09-21 18:24:46,"Tino Hean" <tinoh...@gmail.com> 写道:
>*大家好, *
>*我正在测试在k8s集群部署模式下使用HA架构的HDFS集群, 以下是我的提交命令参数*
>./bin/flink run-application \
>    --detached \
>    --target kubernetes-application \
>    -Dkubernetes.cluster-id=test \
>    -Dkubernetes.container.image=flink-java11 \
>    -Dfs.default-scheme=hdfs://datacluster \
>    -Dkubernetes.rest-service.exposed.type=LoadBalancer \
>
>-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>\
>    -Dhigh-availability.storageDir=hdfs://datacluster/flink/recovery \
>    -Dkubernetes.namespace=flink \
>    -Dkubernetes.service-account=flink-sa \
>    -Denv.hadoop.conf.dir=/opt/flink/conf \
>    -Dkubernetes.container.image.pull-policy=Always \
>    local:///opt/flink/usrlib/test.jar
>
>*我已经复制了core-site.xml 和hdfs-site.xml到$FLINK_HOME/conf下,  目录结构如下*
>flink@e3187a41a139:~$ ls conf
>core-site.xml hdfs-site.xml log4j-console.properties
>log4j-session.properties logback-session.xml masters zoo.cfg
>flink-conf.yaml log4j-cli.properties log4j.properties logback-console.xml
>logback.xml workers
>
>*但是遇到了下面的报错:*
>
>2022-09-21 10:17:40,156 ERROR
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Could not
>start cluster entrypoint KubernetesApplicationClusterEntrypoint.
>org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to
>initialize the cluster entrypoint KubernetesApplicationClusterEntrypoint.
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:250)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:711)
>[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.kubernetes.entrypoint.KubernetesApplicationClusterEntrypoint.main(KubernetesApplicationClusterEntrypoint.java:86)
>[flink-dist-1.15.2.jar:1.15.2]
>Caused by: org.apache.flink.util.FlinkException: Could not create the ha
>services from the instantiated HighAvailabilityServicesFactory
>org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:287)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
>        at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?]
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224)
>~[flink-dist-1.15.2.jar:1.15.2]
>        ... 2 more
>Caused by: java.io.IOException: Could not create FileSystem for highly
>available storage path (hdfs://datacluster/flink/recovery/cruiser)
>        at
>org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:102)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
>        at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?]
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224)
>~[flink-dist-1.15.2.jar:1.15.2]
>        ... 2 more
>Caused by: java.io.IOException: Cannot instantiate file system for URI:
>hdfs://datacluster/flink/recovery/cruiser
>        at
>org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:196)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
>        at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?]
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224)
>~[flink-dist-1.15.2.jar:1.15.2]
>        ... 2 more
>Caused by: java.lang.IllegalArgumentException:
>java.net.UnknownHostException: datacluster
>        at
>org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:140)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:358)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:292)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:200)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:185)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:168)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
>        at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?]
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224)
>~[flink-dist-1.15.2.jar:1.15.2]
>        ... 2 more
>Caused by: java.net.UnknownHostException: datacluster
>        at
>org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:445)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:140)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:358)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:292)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.initDFSClient(DistributedFileSystem.java:200)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:185)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:168)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:528)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at org.apache.flink.core.fs.Path.getFileSystem(Path.java:274)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createFileSystemBlobStore(BlobUtils.java:99)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.blob.BlobUtils.createBlobStoreFromConfig(BlobUtils.java:86)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory.createHAServices(KubernetesHaServicesFactory.java:53)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createCustomHAServices(HighAvailabilityServicesUtils.java:284)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:143)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:427)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:376)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:277)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:227)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
>        at javax.security.auth.Subject.doAs(Unknown Source) ~[?:?]
>        at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>~[flink-shaded-hadoop-3-uber-3.1.1.7.2.9.0-173-9.0.jar:3.1.1.7.2.9.0-173-9.0]
>        at
>org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>~[flink-dist-1.15.2.jar:1.15.2]
>        at
>org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:224)
>~[flink-dist-1.15.2.jar:1.15.2]
>        ... 2 more
>
>*我尝试把以下提交的格式更换为active的namenode节点可以正常提交并运行,*
>    -Dfs.default-scheme=hdfs://namenode-active:8020 \
>
>-Dhigh-availability.storageDir=hdfs://namenode-active:8020/flink/recovery \
>
>所以还无法明白我是哪里搞错了, 是不是flink不支持HA架构的hdfs集群, 望解惑, 非常感谢!
>
>*补充信息:*
>hadoop版本: 3.3.4
>flink版本: 1.15.2
>
>*hadoop hdfs-site.xml相关配置如下*
>
><property>
>  <name>dfs.nameservices</name>
>  <value>datacluster</value>
></property>
><property>
>  <name>dfs.ha.namenodes.datacluster</name>
>  <value>nn1,nn2</value>
></property>
><property>
>  <name>dfs.namenode.rpc-address.datacluster.nn1</name>
>  <value>namenode-active:8020</value>
></property>
><property>
>  <name>dfs.namenode.rpc-address.datacluster.nn2</name>
>  <value>namenode-standby:8020</value>
></property>
><property>
>  <name>dfs.namenode.http-address.datacluster.nn1</name>
>  <value>namenode-active:9870</value>
></property>
><property>
>  <name>dfs.namenode.http-address.datacluster.nn2</name>
>  <value>namenode-standby:9870</value>
></property>

回复