No, I guess it's stable. 2021-11-02 22:41:08,276 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2021-11-02 22:41:08,292 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.10.0, Rev:aa4eb8f, Date:07.02.2020 @ 19:18:19 CET) 2021-11-02 22:41:08,292 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: flink 2021-11-02 22:41:08,304 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found> 2021-11-02 22:41:08,304 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Private Build - 1.8/25.292-b10 2021-11-02 22:41:08,306 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 2944 MiBytes 2021-11-02 22:41:08,306 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: (not set) 2021-11-02 22:41:08,311 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available 2021-11-02 22:41:08,311 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2021-11-02 22:41:08,313 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms3072m 2021-11-02 22:41:08,313 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx3072m 2021-11-02 22:41:08,313 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/opt/flink-1.10.0/log/flink-flink-standalonesession-0-xxxxxxjob-0003.log 2021-11-02 22:41:08,313 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink-1.10.0/conf/log4j.properties 2021-11-02 22:41:08,313 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink-1.10.0/conf/logback.xml 2021-11-02 22:41:08,314 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: 2021-11-02 22:41:08,317 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir 2021-11-02 22:41:08,318 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink-1.10.0/conf 2021-11-02 22:41:08,318 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --executionMode 2021-11-02 22:41:08,318 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster 2021-11-02 22:41:08,318 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host 2021-11-02 22:41:08,318 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - xxxxxxjob-0003 2021-11-02 22:41:08,329 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --webui-port 2021-11-02 22:41:08,330 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 8081 2021-11-02 22:41:08,330 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink-1.10.0/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink-1.10.0/lib/flink-table_2.12-1.10.0.jar:/opt/flink-1.10.0/lib/log4j-1.2.17.jar:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.10.0/lib/flink-dist_2.12-1.10.0.jar::: 2021-11-02 22:41:08,330 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2021-11-02 22:41:08,362 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT] 2021-11-02 22:41:08,558 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: env.ssh.opts, -l flink -oStrictHostKeyChecking=no 2021-11-02 22:41:08,558 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: cluster.evenly-spread-out-slots, true 2021-11-02 22:41:08,559 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 3072m 2021-11-02 22:41:08,559 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.flink.size, 3072m 2021-11-02 22:41:08,559 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.jvm-metaspace.size, 256m 2021-11-02 22:41:08,559 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 8 2021-11-02 22:41:08,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2021-11-02 22:41:08,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability, zookeeper 2021-11-02 22:41:08,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.storageDir, file:///mnt/flink/ha/flink_1_10/ 2021-11-02 22:41:08,560 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.quorum, xxxxxx-0001.xxxxxx.xxxxxx:2181,xxxxxx-0002.xxxxxx.xxxxxx:2181,xxxxxx-0003.xxxxxx.xxxxxx:2181 2021-11-02 22:41:08,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.zookeeper.path.root, /flink_1_10 2021-11-02 22:41:08,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: high-availability.cluster-id, /flink_1_10_cluster_0001 2021-11-02 22:41:08,561 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: web.upload.dir, /mnt/flink/uploads/flink_1_10 2021-11-02 22:41:08,562 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.backend, filesystem 2021-11-02 22:41:08,562 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.checkpoints.dir, file:///mnt/flink/checkpoints/flink_1_10 2021-11-02 22:41:08,562 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: state.savepoints.dir, file:///mnt/flink/savepoints/flink_1_10 2021-11-02 22:41:09,935 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint. 2021-11-02 22:41:09,935 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem. 2021-11-02 22:41:10,405 INFO org.apache.flink.xxxxxx.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2021-11-02 22:41:10,482 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context. 2021-11-02 22:41:10,516 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2021-11-02 22:41:10,615 INFO org.apache.flink.runtime.security.modules.JaasModule - Jaas file will be created as /tmp/jaas-7770543068119743820.conf. 2021-11-02 22:41:10,638 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2021-11-02 22:41:10,639 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services. 2021-11-02 22:41:10,744 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at xxxxxxjob-0003:0 2021-11-02 22:41:13,357 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2021-11-02 22:41:13,459 INFO akka.remote.Remoting - Starting remoting 2021-11-02 22:41:14,277 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@xxxxxxjob-0003:39977] 2021-11-02 22:41:14,868 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@xxxxxxjob-0003:39977 2021-11-02 22:41:14,966 INFO org.apache.flink.runtime.blob.FileSystemBlobStore - Creating highly available BLOB storage directory at file:/mnt/flink/ha/flink_1_10/flink_1_10_cluster_0001/blob 2021-11-02 22:41:14,990 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Enforcing default ACL for ZK connections 2021-11-02 22:41:14,991 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using '/flink_1_10/flink_1_10_cluster_0001' as Zookeeper namespace. 2021-11-02 22:41:15,247 INFO org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting 2021-11-02 22:41:15,267 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT 2021-11-02 22:41:15,267 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:host.name=xxxxxxjob-0003 2021-11-02 22:41:15,281 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_292 2021-11-02 22:41:15,281 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Private Build 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/opt/flink-1.10.0/lib/flink-table-blink_2.12-1.10.0.jar:/opt/flink-1.10.0/lib/flink-table_2.12-1.10.0.jar:/opt/flink-1.10.0/lib/log4j-1.2.17.jar:/opt/flink-1.10.0/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.10.0/lib/flink-dist_2.12-1.10.0.jar::: 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA> 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:os.version=4.15.0-161-generic 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:user.name=flink 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:user.home=/home/flink 2021-11-02 22:41:15,282 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/opt/flink-1.10.0 2021-11-02 22:41:15,283 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=xxxxxx-0001.xxxxxx.xxxxxx:2181,xxxxxx-0002.xxxxxx.xxxxxx:2181,xxxxxx-0003.xxxxxx.xxxxxx:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@27216cd 2021-11-02 22:41:15,377 WARN org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-7770543068119743820.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it. 2021-11-02 22:41:15,379 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Opening socket connection to server xxxxxx.35/xxxxxx.35:2181 2021-11-02 22:41:15,386 ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState - Authentication failed 2021-11-02 22:41:15,396 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Socket connection established to xxxxxx.35/xxxxxx.35:2181, initiating session 2021-11-02 22:41:15,421 INFO org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn - Session establishment complete on server xxxxxx.35/xxxxxx.35:2181, sessionid = 0x200000086a20007, negotiated timeout = 40000 2021-11-02 22:41:15,425 INFO org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED 2021-11-02 22:41:15,438 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-9cb73f27-11db-4c42-a3fc-9b77f558e722 2021-11-02 22:41:15,451 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:34845 - max concurrent requests: 50 - max backlog: 1000 2021-11-02 22:41:15,496 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported. 2021-11-02 22:41:15,509 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at xxxxxxjob-0003:0 2021-11-02 22:41:15,624 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2021-11-02 22:41:15,654 INFO akka.remote.Remoting - Starting remoting 2021-11-02 22:41:15,700 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@xxxxxxjob-0003:38997] 2021-11-02 22:41:15,733 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink-metrics@xxxxxxjob-0003:38997 2021-11-02 22:41:15,755 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.metrics.dump.MetricQueryService at akka://flink-metrics/user/MetricQueryService . 2021-11-02 22:41:16,379 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /tmp/executionGraphStore-40cf7548-25fc-4b2b-a6a8-d504eb611847, expiration time 3600000, maximum cache size 52428800 bytes. 2021-11-02 22:41:16,526 INFO org.apache.flink.configuration.Configuration - Config uses fallback configuration key 'jobmanager.rpc.address' instead of key 'rest.address' 2021-11-02 22:41:16,526 INFO org.apache.flink.configuration.Configuration - Config uses fallback configuration key 'rest.port' instead of key 'rest.bind-port' 2021-11-02 22:41:16,536 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /mnt/flink/uploads/flink_1_10/flink-web-upload does not exist. 2021-11-02 22:41:16,558 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /mnt/flink/uploads/flink_1_10/flink-web-upload for file uploads. 2021-11-02 22:41:16,563 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint. 2021-11-02 22:41:17,262 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component log file: /opt/flink-1.10.0/log/flink-flink-standalonesession-0-xxxxxxjob-0003.log 2021-11-02 22:41:17,263 INFO org.apache.flink.runtime.webmonitor.WebMonitorUtils - Determined location of main cluster component stdout file: /opt/flink-1.10.0/log/flink-flink-standalonesession-0-xxxxxxjob-0003.out 2021-11-02 22:41:18,135 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at xxxxxxjob-0003:8081 2021-11-02 22:41:18,145 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/rest_server_lock'}. 2021-11-02 22:41:18,303 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://xxxxxxjob-0003:8081. 2021-11-02 22:41:18,385 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager . 2021-11-02 22:41:18,430 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/dispatcher_lock'}. 2021-11-02 22:41:18,431 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/resource_manager_lock. 2021-11-02 22:41:18,431 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService /leader/dispatcher_lock. 2021-11-02 22:41:18,437 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService ZooKeeperLeaderElectionService{leaderPath='/leader/resource_manager_lock'}. 2021-11-02 23:20:22,682 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler - Failed to transfer file from TaskExecutor 7e1b7db5918004e4160fdecec1bbdad7. java.util.concurrent.CompletionException: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:135) at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. ... 10 more Caused by: java.io.FileNotFoundException: Local file /tmp/blobStore-9cb73f27-11db-4c42-a3fc-9b77f558e722/no_job/blob_t-274d3c2d5acd78ced877d898b1877b10b62a64df-590b54325d599a6782a77413691e0a7b does not exist and failed to copy from blob store. at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:516) at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:444) at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java:369) at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:133) ... 9 more 2021-11-02 23:20:22,703 ERROR org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerLogFileHandler - Unhandled exception. org.apache.flink.util.FlinkException: Could not retrieve file from transient blob store. at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:135) at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:670) at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:646) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) at org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: Local file /tmp/blobStore-9cb73f27-11db-4c42-a3fc-9b77f558e722/no_job/blob_t-274d3c2d5acd78ced877d898b1877b10b62a64df-590b54325d599a6782a77413691e0a7b does not exist and failed to copy from blob store. at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:516) at org.apache.flink.runtime.blob.BlobServer.getFileInternal(BlobServer.java:444) at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java:369) at org.apache.flink.runtime.rest.handler.taskmanager.AbstractTaskManagerFileHandler.lambda$respondToRequest$0(AbstractTaskManagerFileHandler.java:133) ... 9 more 2021-11-02 23:47:57,865 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [xxxxxxjob-0001/xxxxxx.72:37007] failed with java.io.IOException: Connection reset by peer 2021-11-02 23:47:57,912 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink@xxxxxxjob-0001:37007] has failed, address is now gated for [50] ms. Reason: [Disassociated] 2021-11-02 23:53:41,565 WARN akka.remote.transport.netty.NettyTransport - Remote connection to [xxxxxxjob-0001/xxxxxx.72:42961] failed with java.io.IOException: Connection reset by peer 2021-11-02 23:53:41,571 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink-metrics@xxxxxxjob-0001:42961] has failed, address is now gated for [50] ms. Reason: [Disassociated]
On Thu, 4 Nov 2021 at 03:45, Guowei Ma <guowei....@gmail.com> wrote: > >>>Ok I missed the log below. I guess when the task manager was stopped > this happened. > I think if the TM stopped you also would not get the log. But It will > throw another "UnknownTaskExecutorException", which would include something > like “No TaskExecutor registered under ”. > > >>> But I guess it's ok and not a big issue??? > Does this happen continuously? > > Best, > Guowei > > > On Thu, Nov 4, 2021 at 12:39 AM John Smith <java.dev....@gmail.com> wrote: > >> Ok I missed the log below. I guess when the task manager was stopped this >> happened. >> >> I attached the full sequence. But I guess it's ok and not a big issue??? >> >> >> 2021-11-02 23:20:22,682 ERROR >> org.apache.flink.runtime.rest.handler.taskmanager. >> TaskManagerLogFileHandler - Failed to transfer file from TaskExecutor 7e1 >> b7db5918004e4160fdecec1bbdad7. >> java.util.concurrent.CompletionException: org.apache.flink.util. >> FlinkException: Could not retrieve file from transient blob store. >> at org.apache.flink.runtime.rest.handler.taskmanager. >> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >> AbstractTaskManagerFileHandler.java:135) >> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture >> .java:670) >> at java.util.concurrent.CompletableFuture$UniAccept.tryFire( >> CompletableFuture.java:646) >> at java.util.concurrent.CompletableFuture$Completion.run( >> CompletableFuture.java:456) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) >> at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop >> .run(NioEventLoop.java:515) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) >> at org.apache.flink.shaded.netty4.io.netty.util.internal. >> ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: org.apache.flink.util.FlinkException: Could not retrieve file >> from transient blob store. >> ... 10 more >> Caused by: java.io.FileNotFoundException: Local file /tmp/blobStore-9 >> cb73f27-11db-4c42-a3fc-9b77f558e722/no_job/blob_t-274d3 >> c2d5acd78ced877d898b1877b10b62a64df-590b54325d599a6782a77413691e0a7b >> does not exist and failed to copy from blob store. >> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >> BlobServer.java:516) >> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >> BlobServer.java:444) >> at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java: >> 369) >> at org.apache.flink.runtime.rest.handler.taskmanager. >> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >> AbstractTaskManagerFileHandler.java:133) >> ... 9 more >> 2021-11-02 23:20:22,703 ERROR >> org.apache.flink.runtime.rest.handler.taskmanager. >> TaskManagerLogFileHandler - Unhandled exception. >> org.apache.flink.util.FlinkException: Could not retrieve file from >> transient blob store. >> at org.apache.flink.runtime.rest.handler.taskmanager. >> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >> AbstractTaskManagerFileHandler.java:135) >> at java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture >> .java:670) >> at java.util.concurrent.CompletableFuture$UniAccept.tryFire( >> CompletableFuture.java:646) >> at java.util.concurrent.CompletableFuture$Completion.run( >> CompletableFuture.java:456) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) >> at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop >> .run(NioEventLoop.java:515) >> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >> SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) >> at org.apache.flink.shaded.netty4.io.netty.util.internal. >> ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.io.FileNotFoundException: Local file /tmp/blobStore-9 >> cb73f27-11db-4c42-a3fc-9b77f558e722/no_job/blob_t-274d3 >> c2d5acd78ced877d898b1877b10b62a64df-590b54325d599a6782a77413691e0a7b >> does not exist and failed to copy from blob store. >> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >> BlobServer.java:516) >> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >> BlobServer.java:444) >> at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer.java: >> 369) >> at org.apache.flink.runtime.rest.handler.taskmanager. >> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >> AbstractTaskManagerFileHandler.java:133) >> ... 9 more >> >> On Wed, 3 Nov 2021 at 02:48, Guowei Ma <guowei....@gmail.com> wrote: >> >>> Hi, Smith >>> >>> It seems that the log file(blob_t-274d3c2d5acd78ced877d89 >>> 8b1877b10b62a64df-590b54325d599a6782a77413691e0a7b) is deleted for some >>> reason. But AFAIK there are no other guys reporting this exception.(Maybe >>> other guys know what would happen). >>> 1. I think if you could refresh the page and you would see the correct >>> result because this would trigger another file retrieving from TM. >>> 2. And It might be more safe that setting an dedicated blob >>> directory path(other than /tmp) `blob.storage.directory`[1] >>> >>> [1] >>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#blob-storage-directory >>> >>> >>> Best, >>> Guowei >>> >>> >>> On Wed, Nov 3, 2021 at 7:50 AM John Smith <java.dev....@gmail.com> >>> wrote: >>> >>>> Hi running Flink 1.10.0 With 3 zookeepers, 3 job nodes and 3 task >>>> nodes. and I saw this exception on the job node logs... >>>> 2021-11-02 23:20:22,703 ERROR >>>> org.apache.flink.runtime.rest.handler.taskmanager. >>>> TaskManagerLogFileHandler - Unhandled exception. >>>> org.apache.flink.util.FlinkException: Could not retrieve file from >>>> transient blob store. >>>> at org.apache.flink.runtime.rest.handler.taskmanager. >>>> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >>>> AbstractTaskManagerFileHandler.java:135) >>>> at java.util.concurrent.CompletableFuture.uniAccept( >>>> CompletableFuture.java:670) >>>> at java.util.concurrent.CompletableFuture$UniAccept.tryFire( >>>> CompletableFuture.java:646) >>>> at java.util.concurrent.CompletableFuture$Completion.run( >>>> CompletableFuture.java:456) >>>> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >>>> AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) >>>> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >>>> SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java: >>>> 416) >>>> at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop >>>> .run(NioEventLoop.java:515) >>>> at org.apache.flink.shaded.netty4.io.netty.util.concurrent. >>>> SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) >>>> at org.apache.flink.shaded.netty4.io.netty.util.internal. >>>> ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) >>>> at java.lang.Thread.run(Thread.java:748) >>>> Caused by: java.io.FileNotFoundException: Local file /tmp/blobStore-9 >>>> cb73f27-11db-4c42-a3fc-9b77f558e722/no_job/blob_t-274d3 >>>> c2d5acd78ced877d898b1877b10b62a64df-590b54325d599a6782a77413691e0a7b >>>> does not exist and failed to copy from blob store. >>>> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >>>> BlobServer.java:516) >>>> at org.apache.flink.runtime.blob.BlobServer.getFileInternal( >>>> BlobServer.java:444) >>>> at org.apache.flink.runtime.blob.BlobServer.getFile(BlobServer >>>> .java:369) >>>> at org.apache.flink.runtime.rest.handler.taskmanager. >>>> AbstractTaskManagerFileHandler.lambda$respondToRequest$0( >>>> AbstractTaskManagerFileHandler.java:133) >>>> ... 9 more >>>> >>>