Thanks Feng, I think my challenge (and why I expected I’d need to use Java) is that there will be parquet files with different schemas landing in the s3 bucket - so I don’t want to hard-code the schema in a sql table definition.
I’m not sure if this is even possible? Maybe I would have to write a job that accepts the schema, directory and iceberg target table as params and start instances of the job through the job api. Unless reading the parquet to a temporary table doesn’t need the schema definition? I couldn't really work things out from the links. Dan ________________________________ From: Feng Jin <jinfeng1...@gmail.com> Sent: Thursday, November 23, 2023 6:49:11 PM To: Oxlade, Dan <dan.oxl...@troweprice.com> Cc: user@flink.apache.org <user@flink.apache.org> Subject: [EXTERNAL] Re: flink s3[parquet] -> s3[iceberg] Hi Oxlade I think using Flink SQL can conveniently fulfill your requirements. For S3 Parquet files, you can create a temporary table using a filesystem connector[1] . For Iceberg tables, FlinkSQL can easily integrate with the Iceberg catalog[2]. Therefore, you can use Flink SQL to export S3 files to Iceberg. If you only need field mapping or transformation, I believe using Flink SQL + UDF (User-Defined Functions) would be sufficient to meet your needs. [1]. https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/filesystem/#directory-watching [nightlies.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__nightlies.apache.org_flink_flink-2Ddocs-2Dmaster_docs_connectors_table_filesystem_-23directory-2Dwatching&d=DwMFaQ&c=NUhaNIajfB1frln1iJ2Yk7NG56jrODI6LbjgSoSeFoE&r=DniGlstAN2EzNsLZ9xC7twBZPQnWEW90QWwFv-Z9BnI&m=nzRd1qdU-XluyJusASfUSi-QLOYVOWY6EvDAlicmzJgVY16Jtg60C5aADMd_oLJg&s=rnrUmbL_i3hK6kK_eWoXjz-67_xsc14c1oUxQrwK75A&e=> [2]. https://iceberg.apache.org/docs/latest/flink-connector/#table-managed-in-hadoop-catalog [iceberg.apache.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink-2Dconnector_-23table-2Dmanaged-2Din-2Dhadoop-2Dcatalog&d=DwMFaQ&c=NUhaNIajfB1frln1iJ2Yk7NG56jrODI6LbjgSoSeFoE&r=DniGlstAN2EzNsLZ9xC7twBZPQnWEW90QWwFv-Z9BnI&m=nzRd1qdU-XluyJusASfUSi-QLOYVOWY6EvDAlicmzJgVY16Jtg60C5aADMd_oLJg&s=gbHDXpaow809oo_go0V99A3jIkA2KMh_mINPyNBwcDs&e=> Best, Feng On Thu, Nov 23, 2023 at 11:23 PM Oxlade, Dan <dan.oxl...@troweprice.com<mailto:dan.oxl...@troweprice.com>> wrote: Hi all, I’m attempting to create a POC in flink to create a pipeline to stream parquet to a data warehouse in iceberg format. Ideally – I’d like to watch a directory in s3 (minio locally) and stream those to iceberg, doing the appropriate schema mapping/translation. I guess first; does this sound like a crazy idea? Assuming not is anyone able to share examples that might get me going. I’ve found lots of iceberg and flink sql examples but I think I’ll need something in java to do the schema mapping. Also some examples reading parquet for s3 seem a little hard to come by. I’m aware I’ll need a catalog, I can use nessie for the prototype. I’m also trying to use minio to get this all working locally but this might just be adding complexity at the moment. TIA Dan T. Rowe Price International Ltd (registered number 3957748) is registered in England and Wales with its registered office at Warwick Court, 5 Paternoster Square, London EC4M 7DX. T. Rowe Price International Ltd is authorised and regulated by the Financial Conduct Authority. The company has a branch in Dubai International Financial Centre (regulated by the DFSA as a Representative Office). T. Rowe Price (including T. Rowe Price International Ltd and its affiliates) and its associates do not provide legal or tax advice. Any tax-related discussion contained in this e-mail, including any attachments, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding any tax penalties or (ii) promoting, marketing, or recommending to any other party any transaction or matter addressed herein. Please consult your independent legal counsel and/or professional tax advisor regarding any legal or tax issues raised in this e-mail. The contents of this e-mail and any attachments are intended solely for the use of the named addressee(s) and may contain confidential and/or privileged information. Any unauthorized use, copying, disclosure, or distribution of the contents of this e-mail is strictly prohibited by the sender and may be unlawful. If you are not the intended recipient, please notify the sender immediately and delete this e-mail. T. Rowe Price International Ltd (registered number 3957748) is registered in England and Wales with its registered office at Warwick Court, 5 Paternoster Square, London EC4M 7DX. T. Rowe Price International Ltd is authorised and regulated by the Financial Conduct Authority. The company has a branch in Dubai International Financial Centre (regulated by the DFSA as a Representative Office). T. Rowe Price (including T. Rowe Price International Ltd and its affiliates) and its associates do not provide legal or tax advice. Any tax-related discussion contained in this e-mail, including any attachments, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding any tax penalties or (ii) promoting, marketing, or recommending to any other party any transaction or matter addressed herein. Please consult your independent legal counsel and/or professional tax advisor regarding any legal or tax issues raised in this e-mail. The contents of this e-mail and any attachments are intended solely for the use of the named addressee(s) and may contain confidential and/or privileged information. Any unauthorized use, copying, disclosure, or distribution of the contents of this e-mail is strictly prohibited by the sender and may be unlawful. If you are not the intended recipient, please notify the sender immediately and delete this e-mail.