Hi Kostas and everyone,
I tried to change setFailOnCheckpointingErrors from True to False, and got
the following trace in Flink GUI when the checkpoint/uploading failed. Not
sure whether it would be of any help in identifying the issue.
BTW, could you please help tell where to find the log file
Hi Kostas, and everyone,
Just some update to my issue: I have tried to:
* changed s3 related configuration in hadoop as suggested by hadoop
document [1]:
increased /fs.s3a.threads.max/ from 10 to 100, and
/fs.s3a.connection.maximum/ from 15 to 120. For reference, I am having only
3 S3 sinks,
Hello Kostas,
Thanks for your time.
I started that job from fresh, set checkpoint interval to 15 minutes. It
completed the first 13 checkpoints successfully, only started failing from
the 14th. I waited for about 20 more checkpoints, but all failed.
Then I cancelled the job, restored from the
Hi Averell,
Did you have other failures before (from which you managed to resume
successfully)?
Can you share a bit more details about your job and potentially the TM/JM
logs?
The only thing I found about this is here
https://forums.aws.amazon.com/thread.jspa?threadID=130172
but Flink does not
Hello everyone,
I have a job which is writing some streams into parquet files in S3. I use
Flink 1.7.2 on EMR 5.21.
My job had been running well, but suddenly it failed to make a checkpoint
with the full stack trace mentioned below. After that failure, the job
restarted from the last successful