How to write stream data to other Hadoop Cluster by StreamingFileSink

Jun Zhang Fri, 04 Oct 2019 22:54:32 -0700

Hi,all:


I have 2 hadoop cluster (hdfs://mycluster1 and hdfs://mycluster2),both of them 
configured the HA,
I have a job ,read from streaming data from kafka, and write it to hdfs by 
StreamingFileSink,now I deployed my job on mycluster1 (flink on yarn),and I 
want to write the data to mycluster2 , how did I add the configure ? If I write 
hdfs://mycluster2/tmp/abc &nbsp; on the path of the StreamingFileSink directly, 
it will report that mycluster2 could not be found.




I look at the source code of 
org.apache.flink.runtime.fs.hdfs.HadoopFsFactory#create. When flink loads 
core-site.xml and hdfs-site.xml, it is first loaded from hadoopConfig, then 
flinkConfig, and finally from classpath. I see flinkConfig does not seem to be 
empty, and the code is loaded by flinkConfig, finally loaded from HADOOP_HOME, 
so the core-site.xml and hdfs-site.xml of mycluster1 cluster will not contain 
the &nbsp;information of mycluster2. Cause mycluster2 not found.




thanks

How to write stream data to other Hadoop Cluster by StreamingFileSink

回复