Hi, Eric. Thanks for reaching out. I'm wondering how do you use the Table API to ingest the data. Since the OOM is too general, do you have any clue for OOM? May be you can use jmap to what occupy the most of memory. If find, you can try to figure out what's the reason, is it cause by memory lack or others.
Btw, have ever tried with Flink SQL to ingeset the data. Does the OOM still happen? Best regards, Yuxia 发件人: "Yang Liu" <eric.liu....@gmail.com> 收件人: "User" <user@flink.apache.org> 发送时间: 星期五, 2023年 2 月 10日 上午 5:10:49 主题: Seeking suggestions for ingesting large amount of data from S3 Hi all, We are trying to ingest large amounts of data (20TB) from S3 using Flink filesystem connector to bootstrap a Hudi table. Data are well partitioned in S3 by date/time, but we have been facing OOM issues in Flink jobs, so we wanted to update the Flink job to ingest the data chunk by chuck (partition by partition) by some kind of looping instead of all at once. Curious what’s the recommended way to do this in Flink. I believe this should be a common use case, so hope to get some ideas here. We have been using Table APIs, but open to other APIs. Thanks & Regards Eric