Hi there,
For several days I have been trying to find the right configuration for my
pipeline which roughly consists in the following schema
RabbitMQ->PyFlink->Nessie/Iceberg/S3.
For what I am going to explain I have tried both locally and through the
official Flink docker images.
I have tried several different flink versions, but for simplicity let's say I
am using the apache-flink==1.18.0 version. So far I have been able to use the
jar in org/apache/iceberg/iceberg-flink-runtime-1.18 to connect to RabbitMQ and
obtain the data from some streams, so I'd say the source side is working.
After that I have been trying to find a way to send the data in those streams
to Iceberg in S3 through Nessie Catalog which is the one I have working. I have
been using this pipeline with both Spark and Trino for some time now so I know
it is working. Now what I am "simply" trying to do is to use my already set up
Nessie catalog through flink.
I have tried to connect both directly through the sql-client.sh in the bin of
pyflink dir and through python as
table_env.execute_sql(f"""
CREATE CATALOG nessie WITH (
'type'='iceberg',
'catalog-impl'='org.apache.iceberg.nessie.NessieCatalog',
'io-impl'='org.apache.iceberg.aws.s3.S3FileIO',
'uri'='http://mynessieip:mynessieport/api/v1',
'ref'='main',
'warehouse'='s3a://mys3warehouse',
's3-access-key-id'='{USER_AWS}',
's3-secret-access-key'='{KEY_AWS}',
's3-path-style-access'='true',
'client-region'='eu-west-1')""")
The Jars I have included (One of the many combinations I've tried with no
result) in my pyflink/lib dir (i also tried to add them with env.add_jars or
--jarfile) are:
* hadoop-aws-3.4.0.jar
*
iceberg-flink-runtime-1.18-1.5.0.jar
*
hadoop-client-3.4.0.jar
*
hadoop-common-3.4.0.jar
*
hadoop-hdfs-3.4.0.jar
Right now I am getting the following error message:
py4j.protocol.Py4JJavaError: An error occurred while calling o56.executeSql.
: java.lang.NoClassDefFoundError: org/apache/hadoop/hdfs/HdfsConfiguration (...)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hdfs.HdfsConfiguration...
But I have gotten several different errors in all the different Jar
combinations I have tried. So my request is, does anybody know if my problem is
JAR related or if I am doing something else wrong? I would be immensely
grateful if someone could guide me to the right steps to implement this
pipeline.
Thanks:)