Konstantin, it still does not quite work The IP is still in place, but… Here is Job manager log metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9249 Starting Job Manager config file: jobmanager.rest.address: crabby-kudu-fdp-flink-jobmanager-service jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 1 parallelism.default: 1 rest.port: 8081 metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9249 blob.server.port: 6124 query.server.port: 6125 Starting standalonesession as a console application on host crabby-kudu-fdp-flink-jobmanager-85c8d799db-46rj2. 2019-02-21 21:00:37,803 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2019-02-21 21:00:37,804 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint (Version: 1.7.1, Rev:89eafb4, Date:14.12.2018 @ 15:48:34 GMT) 2019-02-21 21:00:37,804 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: ? 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found> 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 981 MiBytes 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /docker-java-home/jre 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options: 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m 2019-02-21 21:00:37,805 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments: 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink/conf 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --executionMode 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink/lib/flink-metrics-prometheus-1.7.1.jar:/opt/flink/lib/flink-python_2.11-1.7.1.jar:/opt/flink/lib/flink-queryable-state-runtime_2.11-1.7.1.jar:/opt/flink/lib/flink-table_2.11-1.7.1.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.11-1.7.1.jar::: 2019-02-21 21:00:37,806 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -------------------------------------------------------------------------------- 2019-02-21 21:00:37,808 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT] 2019-02-21 21:00:37,822 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rest.address, crabby-kudu-fdp-flink-jobmanager-service 2019-02-21 21:00:37,822 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2019-02-21 21:00:37,823 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2019-02-21 21:00:37,823 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2019-02-21 21:00:37,823 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2019-02-21 21:00:37,823 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2019-02-21 21:00:37,824 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081 2019-02-21 21:00:37,824 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporters, prom 2019-02-21 21:00:37,825 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.prom.class, org.apache.flink.metrics.prometheus.PrometheusReporter 2019-02-21 21:00:37,825 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.prom.port, 9249 2019-02-21 21:00:37,825 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2019-02-21 21:00:37,825 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2019-02-21 21:00:38,010 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneSessionClusterEntrypoint. 2019-02-21 21:00:38,011 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install default filesystem. 2019-02-21 21:00:38,016 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2019-02-21 21:00:38,023 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install security context. 2019-02-21 21:00:38,031 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2019-02-21 21:00:38,043 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2019-02-21 21:00:38,044 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Initializing cluster services. 2019-02-21 21:00:38,513 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at 127.0.0.1:6123 2019-02-21 21:00:39,304 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-02-21 21:00:39,411 INFO akka.remote.Remoting - Starting remoting 2019-02-21 21:00:39,570 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:00:39,602 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@127.0.0.1:6123 2019-02-21 21:00:39,617 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address' 2019-02-21 21:00:39,626 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-12db5847-9543-43ad-a7fa-19de8e907ed6 2019-02-21 21:00:39,629 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000 2019-02-21 21:00:39,649 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - Configuring prom with {port=9249, class=org.apache.flink.metrics.prometheus.PrometheusReporter}. 2019-02-21 21:00:39,658 INFO org.apache.flink.metrics.prometheus.PrometheusReporter - Started PrometheusReporter HTTP server on port 9249. 2019-02-21 21:00:39,658 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - Reporting metrics for reporter prom of type org.apache.flink.metrics.prometheus.PrometheusReporter. 2019-02-21 21:00:39,659 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Trying to start actor system at 127.0.0.1:0 2019-02-21 21:00:39,714 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-02-21 21:00:39,720 INFO akka.remote.Remoting - Starting remoting 2019-02-21 21:00:39,727 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@127.0.0.1:34006] 2019-02-21 21:00:39,728 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Actor system started at akka.tcp://flink-metrics@127.0.0.1:34006 2019-02-21 21:00:39,797 INFO org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore - Initializing FileArchivedExecutionGraphStore: Storage directory /tmp/executionGraphStore-757ae8c1-c839-4666-9d27-697c34214187, expiration time 3600000, maximum cache size 52428800 bytes. 2019-02-21 21:00:39,821 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /tmp/blobStore-71959baf-25bb-4182-864a-5f4873ea9988 2019-02-21 21:00:39,838 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address' 2019-02-21 21:00:39,839 WARN org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Upload directory /tmp/flink-web-8dfc9112-0fc2-439f-aac5-2bbe5a003835/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available. 2019-02-21 21:00:39,840 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Created directory /tmp/flink-web-8dfc9112-0fc2-439f-aac5-2bbe5a003835/flink-web-upload for file uploads. 2019-02-21 21:00:39,896 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Starting rest endpoint. 2019-02-21 21:00:40,611 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - Log file environment variable 'log.file' is not set. 2019-02-21 21:00:40,611 WARN org.apache.flink.runtime.webmonitor.WebMonitorUtils - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (deprecated keys: [jobmanager.web.log.path])'. 2019-02-21 21:00:41,098 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:00:41,301 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint listening at 127.0.0.1:8081 2019-02-21 21:00:41,301 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - http://127.0.0.1:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000 2019-02-21 21:00:41,301 INFO org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend listening at http://127.0.0.1:8081 . 2019-02-21 21:00:41,598 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at akka://flink/user/resourcemanager . 2019-02-21 21:00:41,616 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at akka://flink/user/dispatcher . 2019-02-21 21:00:41,711 INFO org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - ResourceManager akka.tcp://flink@127.0.0.1:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000 2019-02-21 21:00:41,712 INFO org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting the SlotManager. 2019-02-21 21:00:41,807 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher akka.tcp://flink@127.0.0.1:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000 2019-02-21 21:00:41,898 INFO org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all persisted jobs. 2019-02-21 21:00:44,420 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:00,434 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:04,353 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:20,474 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:24,393 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:40,514 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:01:44,433 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:02:00,554 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123] 2019-02-21 21:02:04,473 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [akka.tcp://flink@127.0.0.1:6123]
And here is task manager metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9249 Starting Task Manager taskmanager.host : 10.131.2.148 config file: jobmanager.rpc.address: crabby-kudu-fdp-flink-jobmanager-service jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 16 parallelism.default: 1 rest.port: 8081 metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9249 taskmanager.host : 10.131.2.148 blob.server.port: 6124 query.server.port: 6125 Starting taskexecutor as a console application on host crabby-kudu-fdp-flink-taskmanager-9f548f744-xlfqg. 2019-02-21 21:00:38,013 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -------------------------------------------------------------------------------- 2019-02-21 21:00:38,014 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Starting TaskManager (Version: 1.7.1, Rev:89eafb4, Date:14.12.2018 @ 15:48:34 GMT) 2019-02-21 21:00:38,014 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - OS current user: ? 2019-02-21 21:00:38,014 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current Hadoop/Kerberos user: <no hadoop dependency found> 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap size: 922 MiBytes 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME: /docker-java-home/jre 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - No Hadoop Dependency available 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM Options: 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:+UseG1GC 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xms922M 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xmx922M 2019-02-21 21:00:38,015 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:MaxDirectMemorySize=8388607T 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Program Arguments: 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --configDir 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - /opt/flink/conf 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink/lib/flink-metrics-prometheus-1.7.1.jar:/opt/flink/lib/flink-python_2.11-1.7.1.jar:/opt/flink/lib/flink-queryable-state-runtime_2.11-1.7.1.jar:/opt/flink/lib/flink-table_2.11-1.7.1.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.11-1.7.1.jar::: 2019-02-21 21:00:38,016 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -------------------------------------------------------------------------------- 2019-02-21 21:00:38,018 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Registered UNIX signal handlers for [TERM, HUP, INT] 2019-02-21 21:00:38,021 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum number of open file descriptors is 1048576. 2019-02-21 21:00:38,032 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, crabby-kudu-fdp-flink-jobmanager-service 2019-02-21 21:00:38,032 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2019-02-21 21:00:38,032 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2019-02-21 21:00:38,032 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2019-02-21 21:00:38,033 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 16 2019-02-21 21:00:38,033 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2019-02-21 21:00:38,033 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: rest.port, 8081 2019-02-21 21:00:38,034 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporters, prom 2019-02-21 21:00:38,034 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.prom.class, org.apache.flink.metrics.prometheus.PrometheusReporter 2019-02-21 21:00:38,035 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: metrics.reporter.prom.port, 9249 2019-02-21 21:00:38,035 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.host, 10.131.2.148 2019-02-21 21:00:38,035 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: blob.server.port, 6124 2019-02-21 21:00:38,035 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: query.server.port, 6125 2019-02-21 21:00:38,041 INFO org.apache.flink.core.fs.FileSystem - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available. 2019-02-21 21:00:38,060 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath. 2019-02-21 21:00:38,082 INFO org.apache.flink.runtime.security.SecurityUtils - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath. 2019-02-21 21:00:43,278 WARN org.apache.flink.configuration.Configuration - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address' 2019-02-21 21:00:43,281 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Using configured hostname/address for TaskManager: 10.131.2.148. 2019-02-21 21:00:43,283 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at 10.131.2.148:0 2019-02-21 21:00:43,686 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-02-21 21:00:43,736 INFO akka.remote.Remoting - Starting remoting 2019-02-21 21:00:43,850 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink@10.131.2.148:38454] 2019-02-21 21:00:43,857 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink@10.131.2.148:38454 2019-02-21 21:00:43,864 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Trying to start actor system at 10.131.2.148:0 2019-02-21 21:00:43,881 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2019-02-21 21:00:43,888 INFO akka.remote.Remoting - Starting remoting 2019-02-21 21:00:43,897 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@10.131.2.148:34162] 2019-02-21 21:00:43,898 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Actor system started at akka.tcp://flink-metrics@10.131.2.148:34162 2019-02-21 21:00:43,916 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - Configuring prom with {port=9249, class=org.apache.flink.metrics.prometheus.PrometheusReporter}. 2019-02-21 21:00:43,925 INFO org.apache.flink.metrics.prometheus.PrometheusReporter - Started PrometheusReporter HTTP server on port 9249. 2019-02-21 21:00:43,926 INFO org.apache.flink.runtime.metrics.MetricRegistryImpl - Reporting metrics for reporter prom of type org.apache.flink.metrics.prometheus.PrometheusReporter. 2019-02-21 21:00:43,932 INFO org.apache.flink.runtime.blob.PermanentBlobCache - Created BLOB cache storage directory /tmp/blobStore-da779bfd-52ab-4e50-ae69-37cc363f0880 2019-02-21 21:00:43,934 INFO org.apache.flink.runtime.blob.TransientBlobCache - Created BLOB cache storage directory /tmp/blobStore-9f8aacaf-dede-45c6-9dba-34969b4adcba 2019-02-21 21:00:43,935 INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Starting TaskManager with ResourceID: 24acb543dbb8a7dd0b3f4f92bce93a8f 2019-02-21 21:00:43,939 INFO org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig [server address: /10.131.2.148, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 16 (manual), number of client threads: 16 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)] 2019-02-21 21:00:43,978 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 79 GB, usable 19 GB (24.05% usable) 2019-02-21 21:00:44,050 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768). 2019-02-21 21:00:44,105 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components. 2019-02-21 21:00:44,141 INFO org.apache.flink.runtime.io.network.netty.NettyClient - Successful initialization (took 34 ms). 2019-02-21 21:00:44,187 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 46 ms). Listening on SocketAddress /10.131.2.148:46191. 2019-02-21 21:00:44,194 INFO org.apache.flink.queryablestate.server.KvStateServerImpl - Started Queryable State Server @ /10.131.2.148:9067. 2019-02-21 21:00:44,206 INFO org.apache.flink.queryablestate.client.proxy.KvStateClientProxyImpl - Started Queryable State Proxy Server @ /10.131.2.148:9069. 2019-02-21 21:00:44,207 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 0.7 of the currently free heap space (639 MB), memory will be allocated lazily. 2019-02-21 21:00:44,210 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-d1a33d1b-838f-4082-86b7-1ade59bdda8a for spill files. 2019-02-21 21:00:44,280 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 10000 ms 2019-02-21 21:00:44,291 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at akka://flink/user/taskmanager_0 . 2019-02-21 21:00:44,305 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job leader service. 2019-02-21 21:00:44,305 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to ResourceManager akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager(00000000000000000000000000000000). 2019-02-21 21:00:44,306 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-807b9b28-6656-4bf9-b5ee-4ce41f3b4513 2019-02-21 21:00:54,330 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:01:14,370 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:01:34,409 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:01:54,449 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:02:14,490 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:02:34,529 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:02:54,569 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:03:14,610 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-02-21 21:03:34,649 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. Something is still not connected Boris Lublinsky FDP Architect boris.lublin...@lightbend.com https://www.lightbend.com/ > On Feb 21, 2019, at 2:05 AM, Konstantin Knauf <konstan...@ververica.com> > wrote: > > Hi Boris, > > the exact command depends on the docker-entrypoint.sh script and the image > you are using. For the example contained in the Flink repository it is > "task-manager", I think. The important thing is to pass "taskmanager.host" to > the Taskmanager process. You can verify by checking the Taskmanager logs. > These should contain lines like below: > > 2019-02-21 08:03:00,004 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Program > Arguments: > 2019-02-21 08:03:00,008 INFO > org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - > -Dtaskmanager.host=10.12.10.173 > > In the Jobmanager logs you should see that the Taskmanager is registered > under the IP above in a line similar to: > > 2019-02-21 08:03:26,874 INFO > org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - > Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda > (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0 > <http://flink@10.12.10.173:46531/user/taskmanager_0>) at ResourceManager > > A service per Taskmanager is not required. The purpose of the config > parameter is that the Jobmanager addresses the taskmanagers by IP instead of > hostname. > > Hope this helps! > > Cheers, > > Konstantin > > > > On Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky > <boris.lublin...@lightbend.com <mailto:boris.lublin...@lightbend.com>> wrote: > Also, The suggested workaround does not quite work. > 2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor > - Association with remote system > [akka.tcp://flink-metrics@flink-taskmanager-1:6170 <>] has failed, address is > now gated for [50] ms. Reason: [Association failed with > [akka.tcp://flink-metrics@flink-taskmanager-1:6170 <>]] Caused by: > [flink-taskmanager-1: No address associated with hostname] > 2019-02-20 15:27:48,750 ERROR > org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - > Caught exception > > I think the problem is that its trying to connect to flink-task-manager-1 > > Using busybody to experiment with nslookup, I can see > / # nslookup flink-taskmanager-1.flink-taskmanager > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us > <http://ip-10-0-11-151.us/>-west-2.compute.internal > > Name: flink-taskmanager-1.flink-taskmanager > Address 1: 10.131.2.136 > flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local > / # nslookup flink-taskmanager-1 > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us > <http://ip-10-0-11-151.us/>-west-2.compute.internal > > nslookup: can't resolve 'flink-taskmanager-1' > / # nslookup flink-taskmanager-0.flink-taskmanager > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us > <http://ip-10-0-11-151.us/>-west-2.compute.internal > > Name: flink-taskmanager-0.flink-taskmanager > Address 1: 10.131.0.111 > flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local > / # nslookup flink-taskmanager-0 > Server: 10.0.11.151 > Address 1: 10.0.11.151 ip-10-0-11-151.us > <http://ip-10-0-11-151.us/>-west-2.compute.internal > > nslookup: can't resolve 'flink-taskmanager-0' > / # > > So the name should be postfixed with the service name. How do I force it? I > suspect I am missing config parameter > > > Boris Lublinsky > FDP Architect > boris.lublin...@lightbend.com <mailto:boris.lublin...@lightbend.com> > https://www.lightbend.com/ <https://www.lightbend.com/> >> On Feb 19, 2019, at 4:33 AM, Konstantin Knauf <konstan...@ververica.com >> <mailto:konstan...@ververica.com>> wrote: >> >> Hi Boris, >> >> the solution is actually simpler than it sounds from the ticket. The only >> thing you need to do is to set the "taskmanager.host" to the Pod's IP >> address in the Flink configuration. The easiest way to do this is to pass >> this config dynamically via a command-line parameter. >> >> The Deployment spec could looks something like this: >> containers: >> - name: taskmanager >> [...] >> args: >> - "taskmanager.sh" >> - "start-foreground" >> - "-Dtaskmanager.host=$(K8S_POD_IP)" >> [...] >> env: >> - name: K8S_POD_IP >> valueFrom: >> fieldRef: >> fieldPath: status.podIP >> >> Hope this helps and let me know if this works. >> >> Best, >> >> Konstantin >> >> On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky >> <boris.lublin...@lightbend.com <mailto:boris.lublin...@lightbend.com>> wrote: >> I was looking at this issue >> https://issues.apache.org/jira/browse/FLINK-11127 >> <https://issues.apache.org/jira/browse/FLINK-11127> >> Apparently there is a workaround for it. >> Is it possible provide the complete helm chart for it. >> Bits and pieces are in the ticket, but it would be nice to see the full chart >> >> Boris Lublinsky >> FDP Architect >> boris.lublin...@lightbend.com <mailto:boris.lublin...@lightbend.com> >> https://www.lightbend.com/ <https://www.lightbend.com/> >> >> >> -- >> Konstantin Knauf | Solutions Architect >> +49 160 91394525 >> >> <https://www.ververica.com/> >> Follow us @VervericaData >> -- >> Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference >> Stream Processing | Event Driven | Real Time >> -- >> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> -- >> Data Artisans GmbH >> Registered at Amtsgericht Charlottenburg: HRB 158244 B >> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen > > > > -- > Konstantin Knauf | Solutions Architect > +49 160 91394525 > <https://www.ververica.com/> > Follow us @VervericaData > -- > Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference > Stream Processing | Event Driven | Real Time > -- > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > -- > Data Artisans GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen