Re: S3 parquet sink - failed with S3 connection exception

2019-03-14 Thread Averell
Hi Kostas and everyone, I tried to change setFailOnCheckpointingErrors from True to False, and got the following trace in Flink GUI when the checkpoint/uploading failed. Not sure whether it would be of any help in identifying the issue. BTW, could you please help tell where to find the log file

Re: S3 parquet sink - failed with S3 connection exception

2019-03-10 Thread Averell
Hi Kostas, and everyone, Just some update to my issue: I have tried to: * changed s3 related configuration in hadoop as suggested by hadoop document [1]: increased /fs.s3a.threads.max/ from 10 to 100, and /fs.s3a.connection.maximum/ from 15 to 120. For reference, I am having only 3 S3 sinks,

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Averell
Hello Kostas, Thanks for your time. I started that job from fresh, set checkpoint interval to 15 minutes. It completed the first 13 checkpoints successfully, only started failing from the 14th. I waited for about 20 more checkpoints, but all failed. Then I cancelled the job, restored from the

Re: S3 parquet sink - failed with S3 connection exception

2019-03-05 Thread Kostas Kloudas
Hi Averell, Did you have other failures before (from which you managed to resume successfully)? Can you share a bit more details about your job and potentially the TM/JM logs? The only thing I found about this is here https://forums.aws.amazon.com/thread.jspa?threadID=130172 but Flink does not

S3 parquet sink - failed with S3 connection exception

2019-03-04 Thread Averell
Hello everyone, I have a job which is writing some streams into parquet files in S3. I use Flink 1.7.2 on EMR 5.21. My job had been running well, but suddenly it failed to make a checkpoint with the full stack trace mentioned below. After that failure, the job restarted from the last successful