Uploading jar to s3 for persistence

2022-01-09 Thread Puneet Duggal
Hi, Currently i am working with flink HA cluster with 3 job managers and 3 zookeeper nodes. Also i am persisting my checkpoints to s3 and hence already configured required flink-s3 jars during flink job manager and task manager process startup. Now i have configured a variable web.upload.dir

Re: Uploading jar to s3 for persistence

2022-01-10 Thread Piotr Nowojski
Hi Puneet, Have you seen this thread before? [1]. It looks like the same issue and especially this part might be the key: > Be aware that the filesystem used by the FileUploadHandler > is java.nio.file.FileSystem and not > Flink's org.apache.flink.core.fs.FileSystem for which we provide different

Re: Uploading jar to s3 for persistence

2022-01-10 Thread Puneet Duggal
Hi Piotr, Thank you for your immediate reply. I went through this thread and it was also mentioned that flink required s3-filesystem related jars which are present in my HA flink cluster. Also as mentioned in Apache Flink Documentation for Amazon S3 integration , https://nightlies.apache.org/f

Re: Uploading jar to s3 for persistence

2022-01-10 Thread Puneet Duggal
Hi, Ignore above reply. Got your point. Just one doubt. So is using java.nio.file.FileSystem an expectation instead of Flink’s org.apache.flink.core.fs.FileSystem. I mean can we raise it as an issue to use flink filesystem instead as it allows us to use distributed filesystem as persistent sto

Re: Uploading jar to s3 for persistence

2022-01-10 Thread David Morávek
Hi Puneet, this is a known limitation and unfortunately `web.upload.dir` currently works only with the local system :( There are multiple issues covering this already, I guess FLINK-16544 [1] summarizes the current state well. This is something we want to address with the future releases. We've b

Re: Uploading jar to s3 for persistence

2022-01-10 Thread David Morávek
I understand the issue. We currently don't have a good mechanism for this kind of external file management (we need to avoid leaking resources) :( Even right now, we kind of rely on upload directory being cleaned up by the cluster manager (yarn, k8s), because it's tied with a container lifecycle.