Thanks a lot for your message. This could be a bug in Flink. It seems that
the archival of the execution graph is failing because some classes are
unloaded.

What I observe from your stack traces is that some classes are loaded from
flink-dist_2.11-1.11.2.jar, while other classes are loaded from
template-common-jar-0.0.1. Maybe Flink is closing the usercode classloader,
and this is causing the exception during the archival of the execution
graph. Can you make sure that the core Flink classes are only in your
classpath once (in flink-dist), and the template-common-jar-0.0.1 doesn't
contain the runtime Flink classes? (for example by setting the Flink
dependencies to provided when using the maven-shade-plugin).

For the issue while submitting the job, I can not provide you any further
help, because you haven't posted the exception that occurred in the REST
handler. Could you post this exception here as well?

Best wishes,
Robert



On Sun, Apr 25, 2021 at 2:44 PM chenxuying <cxydeve...@163.com> wrote:

> environment:
>
> flinksql 1.12.2
>
> k8s session mode
>
> description:
>
> I got follow error log when my kafka connector port was wrong
>
> >>>>>
>
> 2021-04-25 16:49:50
>
> org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms
> expired before the position for partition filebeat_json_install_log-3 could
> be determined
>
> >>>>>
>
>
> I got follow error log when my kafka connector ip was wrong
>
> >>>>>
>
> 2021-04-25 20:12:53
>
> org.apache.flink.runtime.JobException: Recovery is suppressed by
> NoRestartBackoffTimeStrategy
>
> at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:118)
>
> at
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:80)
>
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:233)
>
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:224)
>
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:215)
>
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:669)
>
> at
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
>
> at
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:447)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305)
>
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212)
>
> at
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
>
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
>
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>
> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>
> at
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>
> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>
> at
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout
> expired while fetching topic metadata
>
> >>>>>
>
>
> When the job was cancelled,there was follow error log:
>
> >>>>>
>
> 2021-04-25 08:53:41,115 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job
> v2_ods_device_action_log (fcc451b8a521398b10e5b86153141fbf) switched from
> state CANCELLING to CANCELED.
>
> 2021-04-25 08:53:41,115 INFO
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Stopping
> checkpoint coordinator for job fcc451b8a521398b10e5b86153141fbf.
>
> 2021-04-25 08:53:41,115 INFO
> org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore [] -
> Shutting down
>
> 2021-04-25 08:53:41,115 INFO
> org.apache.flink.runtime.checkpoint.CompletedCheckpoint      [] -
> Checkpoint with ID 1 at
> 'oss://tanwan-datahub/test/flinksql/checkpoints/fcc451b8a521398b10e5b86153141fbf/chk-1'
> not discarded.
>
> 2021-04-25 08:53:41,115 INFO
> org.apache.flink.runtime.checkpoint.CompletedCheckpoint      [] -
> Checkpoint with ID 2 at
> 'oss://tanwan-datahub/test/flinksql/checkpoints/fcc451b8a521398b10e5b86153141fbf/chk-2'
> not discarded.
>
> 2021-04-25 08:53:41,116 INFO
> org.apache.flink.runtime.checkpoint.CompletedCheckpoint      [] -
> Checkpoint with ID 3 at
> 'oss://tanwan-datahub/test/flinksql/checkpoints/fcc451b8a521398b10e5b86153141fbf/chk-3'
> not discarded.
>
> 2021-04-25 08:53:41,137 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job
> fcc451b8a521398b10e5b86153141fbf reached globally terminal state CANCELED.
>
> 2021-04-25 08:53:41,148 INFO
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Stopping
> the JobMaster for job
> v2_ods_device_action_log(fcc451b8a521398b10e5b86153141fbf).
>
> 2021-04-25 08:53:41,151 INFO
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] -
> Suspending SlotPool.
>
> 2021-04-25 08:53:41,151 INFO
> org.apache.flink.runtime.jobmaster.JobMaster                 [] - Close
> ResourceManager connection 5bdeb8d0f65a90ecdfafd7f102050b19: JobManager is
> shutting down..
>
> 2021-04-25 08:53:41,151 INFO
> org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl     [] - Stopping
> SlotPool.
>
> 2021-04-25 08:53:41,151 INFO
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] -
> Disconnect job manager 00000000000000000000000000000000
> @akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_3 for job
> fcc451b8a521398b10e5b86153141fbf from the resource manager.
>
> 2021-04-25 08:53:41,178 INFO
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Could not
> archive completed job
> v2_ods_device_action_log(fcc451b8a521398b10e5b86153141fbf) to the history
> server.
>
> java.util.concurrent.CompletionException:
> java.lang.ExceptionInInitializerError
>
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
> ~[?:1.8.0_265]
>
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
> [?:1.8.0_265]
>
> at
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1643)
> [?:1.8.0_265]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_265]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_265]
>
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
>
> Caused by: java.lang.ExceptionInInitializerError
>
> at
> org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:55)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
> ~[template-common-jar-0.0.1.jar:?]
>
> at
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
> ~[?:1.8.0_265]
>
> ... 3 more
>
> Caused by: java.lang.IllegalStateException: zip file closed
>
> at java.util.zip.ZipFile.ensureOpen(ZipFile.java:686) ~[?:1.8.0_265]
>
> at java.util.zip.ZipFile.getEntry(ZipFile.java:315) ~[?:1.8.0_265]
>
> at java.util.jar.JarFile.getEntry(JarFile.java:240) ~[?:1.8.0_265]
>
> at sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:128)
> ~[?:1.8.0_265]
>
> at java.util.jar.JarFile.getJarEntry(JarFile.java:223) ~[?:1.8.0_265]
>
> at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1054)
> ~[?:1.8.0_265]
>
> at sun.misc.URLClassPath.getResource(URLClassPath.java:249) ~[?:1.8.0_265]
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366) ~[?:1.8.0_265]
>
> at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_265]
>
> at java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_265]
>
> at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
> ~[?:1.8.0_265]
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_265]
>
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
> ~[?:1.8.0_265]
>
> at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ~[?:1.8.0_265]
>
> at java.lang.Class.forName0(Native Method) ~[?:1.8.0_265]
>
> at java.lang.Class.forName(Class.java:264) ~[?:1.8.0_265]
>
> at org.apache.logging.log4j.util.LoaderUtil.loadClass(LoaderUtil.java:168)
> ~[log4j-api-2.12.1.jar:2.12.1]
>
> at
> org.apache.logging.slf4j.Log4jLogger.createConverter(Log4jLogger.java:416)
> ~[log4j-slf4j-impl-2.12.1.jar:2.12.1]
>
> at org.apache.logging.slf4j.Log4jLogger.<init>(Log4jLogger.java:54)
> ~[log4j-slf4j-impl-2.12.1.jar:2.12.1]
>
> at
> org.apache.logging.slf4j.Log4jLoggerFactory.newLogger(Log4jLoggerFactory.java:39)
> ~[log4j-slf4j-impl-2.12.1.jar:2.12.1]
>
> at
> org.apache.logging.slf4j.Log4jLoggerFactory.newLogger(Log4jLoggerFactory.java:30)
> ~[log4j-slf4j-impl-2.12.1.jar:2.12.1]
>
> at
> org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:54)
> ~[log4j-api-2.12.1.jar:2.12.1]
>
> at
> org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:30)
> ~[log4j-slf4j-impl-2.12.1.jar:2.12.1]
>
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:277)
> ~[template-common-jar-0.0.1.jar:?]
>
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:288)
> ~[template-common-jar-0.0.1.jar:?]
>
> at
> org.apache.flink.runtime.history.FsJobArchivist.<clinit>(FsJobArchivist.java:50)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.runtime.dispatcher.JsonResponseHistoryServerArchivist.lambda$archiveExecutionGraph$0(JsonResponseHistoryServerArchivist.java:55)
> ~[flink-dist_2.11-1.11.2.jar:1.11.2]
>
> at
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
> ~[template-common-jar-0.0.1.jar:?]
>
> at
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
> ~[?:1.8.0_265]
>
> ... 3 more
>
> >>>>>
>
>
> And then I will get follow error log when I run a new job, unless I
> restart the cluster
>
> >>>>>
>
> 2021-04-25 08:54:06,711 INFO  org.apache.flink.client.ClientUtils
>                 [] - Starting program (detached: true)
>
> 2021-04-25 08:54:06,715 INFO
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using
> predefined options: DEFAULT.
>
> 2021-04-25 08:54:06,715 INFO
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend [] - Using
> default options factory:
> DefaultConfigurableOptionsFactory{configuredOptions={}}.
>
> 2021-04-25 08:54:06,722 ERROR
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler   [] - Exception
> occurred in REST handler: Could not execute application.
>
> >>>>>
>
>
>
>
>

Reply via email to