Hi Averell,

> Then I have another question: when JM cannot start/connect to the JM on
.88,
> why didn't it try on .82 where resource are still available?

When you are deploying on YARN, the TM container placement is decided by the
YARN scheduler and not by Flink. Without seeing the complete logs, it is
difficult to tell what happened. If you need help with debugging, please
enable YARN's log aggregation and attach the output of:

    yarn logs -applicationId <APP_ID>

Do I understand it correctly that your problem was solved by changing the
zookeper connection string?

Best,
Gary

On Wed, Jan 23, 2019 at 12:44 PM Averell <lvhu...@gmail.com> wrote:

> Hi Gary,
>
> Thanks for your support.
>
> I use flink 1.7.0. I will try to test without that -n.
> Here below are the JM log (on server .82) and TM log (on server .88). I'm
> sorry that I missed that TM log before asking, had a thought that it would
> not relevant. I just fixed the issue with connection to zookeeper and the
> problem was solved.
>
> Then I have another question: when JM cannot start/connect to the JM on
> .88,
> why didn't it try on .82 where resource are still available?
>
> Thanks and regards,
> Averell
>
> Here is the JM log (from /mnt/var/log/hadoop-yarn/.../jobmanager.log on
> .82)
> (it seems irrelevant. Even the earlier message regarding
> NoResourceAvailable
> was there in GUI, but not found in the jobmanager.log file):
>
> 2019-01-23 04:15:01.869 [main] WARN
> org.apache.flink.configuration.Configuration  - Config uses deprecated
> configuration key 'web.port' instead of proper key 'rest.port'
> 2019-01-23 04:15:03.483 [main] WARN
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint  - Upload
> directory
> /tmp/flink-web-08279f45-0244-4c5c-bc9b-299ac59b4068/flink-web-upload does
> not exist, or has been deleted externally. Previously uploaded files are no
> longer available.
>
> And here is the TM log:
> 2019-01-23 11:07:07.479 [main] ERROR
> o.a.flink.shaded.curator.org.apache.curator.ConnectionState  - Connection
> timed out for connection string (localhost:2181) and timeout (15000) /
> elapsed (56538)
>
> org.apache.flink.shaded.curator.org.apache.curator.CuratorConnectionLossException:
> KeeperErrorCode = ConnectionLoss
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:225)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:117)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.NamespaceImpl$1.call(NamespaceImpl.java:90)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:83)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:594)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:158)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:32)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.recipes.cache.NodeCache.reset(NodeCache.java:242)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.recipes.cache.NodeCache.start(NodeCache.java:175)
>         at
>
> org.apache.flink.shaded.curator.org.apache.curator.framework.recipes.cache.NodeCache.start(NodeCache.java:154)
>         at
>
> org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService.start(ZooKeeperLeaderRetrievalService.java:107)
>         at
>
> org.apache.flink.runtime.taskexecutor.TaskExecutor.start(TaskExecutor.java:277)
>         at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.start(TaskManagerRunner.java:168)
>         at
>
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:332)
>         at
>
> org.apache.flink.yarn.YarnTaskExecutorRunner.lambda$run$0(YarnTaskExecutorRunner.java:142)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>         at
>
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>         at
>
> org.apache.flink.yarn.YarnTaskExecutorRunner.run(YarnTaskExecutorRunner.java:141)
>         at
>
> org.apache.flink.yarn.YarnTaskExecutorRunner.main(YarnTaskExecutorRunner.java:75)
> 2019-01-23 11:07:08.224 [main-SendThread(localhost:2181)] WARN
> o.a.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Session 0x0
> for server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at
>
> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
>         at
>
> org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Reply via email to