Hi Ufuk,

Thanks for replying.
The savepoint path is correct and it exists.
FYI, I used the monitoring REST APIs to cancel the job with savepoint.


[cid:image001.png@01D2A167.3E215140]

Abhinav Bajaj
Lead Engineer
HERE Predictive Analytics
Office:  +12062092767
Mobile: +17083299516

HERE Seattle
701 Pike Street, #2000, Seattle, WA 98101, USA
47° 36' 41" N. 122° 19' 57" W
HERE Maps




From: Ufuk Celebi <u...@apache.org>
Reply-To: "user@flink.apache.org" <user@flink.apache.org>
Date: Monday, March 20, 2017 at 2:41 AM
To: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: Job fails to start with S3 savepoint

Hey Abhinav,

the Exception is thrown if the S3 object does not exist.

Can you double check that it actually does exist (no typos, etc.)?

Could this be related to accessing a different region than expected?

– Ufuk


On Mon, Mar 20, 2017 at 9:38 AM, Timo Walther 
<twal...@apache.org<mailto:twal...@apache.org>> wrote:
Hi Abhinav,

can you check if you have configured your AWS setup correctly? The S3 
configuration might be missing.

https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-release-1.3%2Fsetup%2Faws.html%23missing-s3-filesystem-configuration&data=01%7C01%7C%7Cd049ebd9730747304e0208d46f756af4%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=QqbkwWAbOwb97ckBio4Ty7nL7XJC598mYVeN5nr6HXo%3D&reserved=0>

Regards,
Timo


Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav:
Hi,

I am trying to explore using S3 for storing checkpoints and savepoints.
I can get Flink to store the checkpoints and savepoints in s3.

However, when I try to submit the same Job using the stored savepoint, it fails 
with below exception.
I am using Flink 1.2 and submitted the job from the UI dashboard.

Can anyone guide me through this issue?

Thanks,
Abhinav

Jobmanager logs with exception –

2017-03-18 00:10:09,193 INFO  org.apache.flink.runtime.blob.BlobClient          
             - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.client.JobClient         
             - Checking and uploading JAR files
2017-03-18 00:10:09,348 INFO  org.apache.flink.runtime.blob.BlobClient          
             - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:10:09,501 INFO  org.apache.flink.yarn.YarnJobManager              
             - Submitting job 4425245091bea9ad103dd3ff338244bb (Session Counter 
Example).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager              
             - Using restart strategy NoRestartStrategy for 
4425245091bea9ad103dd3ff338244bb.
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager              
             - Running initialization on master for job Session Counter Example 
(4425245091bea9ad103dd3ff338244bb).
2017-03-18 00:10:09,502 INFO  org.apache.flink.yarn.YarnJobManager              
             - Successfully ran initialization on master in 0 ms.
2017-03-18 00:10:09,503 INFO  org.apache.flink.yarn.YarnJobManager              
             - Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
2017-03-18 00:10:09,636 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state CREATED 
to FAILING.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:10:09,638 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom 
Source -> Map (1/1) (f7e8f6c8d2030f5773f9d162d9ac2797) switched from CREATED to 
CANCELED.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - 
TriggerWindow(TumblingProcessingTimeWindows(15000), 
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59},
 ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> 
Sink: Unnamed (1/1) (7d1917621cf923445ab904bb60c62bfd) switched from CREATED to 
CANCELED.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart 
or fail the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) if 
no longer possible.
2017-03-18 00:10:09,639 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session 
Counter Example (4425245091bea9ad103dd3ff338244bb) switched from state FAILING 
to FAILED.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:10:09,640 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not 
restart the job Session Counter Example (4425245091bea9ad103dd3ff338244bb) 
because a type of SuppressRestartsException was thrown and the restart strategy 
prevented it.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:10:09,640 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping 
checkpoint coordinator for job 4425245091bea9ad103dd3ff338244bb
2017-03-18 00:10:09,640 INFO  
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - 
Shutting down
2017-03-18 00:18:15,290 INFO  org.apache.flink.runtime.blob.BlobClient          
             - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.client.JobClient         
             - Checking and uploading JAR files
2017-03-18 00:18:15,443 INFO  org.apache.flink.runtime.blob.BlobClient          
             - Blob client connecting to akka://flink/user/jobmanager
2017-03-18 00:18:15,596 INFO  org.apache.flink.yarn.YarnJobManager              
             - Submitting job c965addb24f955a28400f89c2a41db57 (Session Counter 
Example).
2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager              
             - Using restart strategy NoRestartStrategy for 
c965addb24f955a28400f89c2a41db57.
2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager              
             - Running initialization on master for job Session Counter Example 
(c965addb24f955a28400f89c2a41db57).
2017-03-18 00:18:15,597 INFO  org.apache.flink.yarn.YarnJobManager              
             - Successfully ran initialization on master in 0 ms.
2017-03-18 00:18:15,598 INFO  org.apache.flink.yarn.YarnJobManager              
             - Starting job from savepoint 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
2017-03-18 00:18:15,728 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session 
Counter Example (c965addb24f955a28400f89c2a41db57) switched from state CREATED 
to FAILING.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:18:15,729 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Source: Custom 
Source -> Map (1/1) (7f4b79ade953f1e75158fc9ef7a197f4) switched from CREATED to 
CANCELED.
2017-03-18 00:18:15,729 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - 
TriggerWindow(TumblingProcessingTimeWindows(15000), 
ListStateDescriptor{serializer=org.apache.flink.api.java.typeutils.runtime.TupleSerializer@f0aadd59},
 ProcessingTimeTrigger(), WindowedStream.apply(WindowedStream.java:521)) -> 
Sink: Unnamed (1/1) (1f7b169898490ee055446ba42d92a0c2) switched from CREATED to 
CANCELED.
2017-03-18 00:18:15,729 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Try to restart 
or fail the job Session Counter Example (c965addb24f955a28400f89c2a41db57) if 
no longer possible.
2017-03-18 00:18:15,729 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Job Session 
Counter Example (c965addb24f955a28400f89c2a41db57) switched from state FAILING 
to FAILED.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:18:15,730 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph         - Could not 
restart the job Session Counter Example (c965addb24f955a28400f89c2a41db57) 
because a type of SuppressRestartsException was thrown and the restart strategy 
prevented it.
org.apache.flink.runtime.execution.SuppressRestartsException: Unrecoverable 
failure. This suppresses job restarts. Please check the stack trace for the 
root cause.
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1369)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1330)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
                at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
                at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
                at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
                at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
                at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
                at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
                at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: Invalid path 
's3://flink-bucket/flink-savepoints/savepoint-0eba6ba712d2'.
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.createFsInputStream(SavepointStore.java:182)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.loadSavepoint(SavepointStore.java:131)
                at 
org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:64)
                at 
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1348)
                ... 10 more
2017-03-18 00:18:15,730 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator      - Stopping 
checkpoint coordinator for job c965addb24f955a28400f89c2a41db57
2017-03-18 00:18:15,730 INFO  
org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore   - 
Shutting down

[cid:image002.png@01D2A167.3E215140]

Abhinav Bajaj
Lead Engineer
HERE Predictive Analytics
Office:  +12062092767<tel:+1%20206-209-2767>
Mobile: +17083299516<tel:+1%20708-329-9516>

HERE Seattle
701 Pike Street, #2000, Seattle, WA 98101, USA
47° 36' 41" N. 122° 19' 57" W
HERE Maps






Reply via email to